Visual Regression Testing of Shadow DOM Components
A visual-regression test that screenshots a Web Component will eventually fail for a reason that has nothing to do with a real visual change: the capture fired while the element was still upgrading, before its slots distributed, or before the web font swapped. The diff is one row of anti-aliased text or a faint metric shift, the baseline was “correct,” and the test is now flaky. The cause is timing, and the fix is to make the capture deterministic.
This deep-dive sits under Contract & Visual Testing, within the broader Distribution, Testing & Tooling section.
The minimal reproducible example
A Playwright test snapshots a card component. It passes nine runs out of ten and fails the tenth with a tiny diff.
// FLAKY — captures whatever happens to be painted at this instant.
import { test, expect } from '@playwright/test';
test('user-card visual', async ({ page }) => {
await page.goto('/components/user-card.html');
await expect(page.locator('user-card')).toHaveScreenshot('user-card.png');
});
The page itself is ordinary:
<!doctype html>
<link rel="stylesheet" href="/fonts/inter.css" /> <!-- @font-face, font-display: swap -->
<user-card name="Ada Lovelace">
<span slot="role">Mathematician</span>
</user-card>
<script type="module" src="/dist/user-card.js"></script>
// user-card.js — defined by a module the parser reaches AFTER the element.
class UserCard extends HTMLElement {
static get observedAttributes() { return ['name']; }
constructor() {
super();
this.attachShadow({ mode: 'open' });
}
connectedCallback() {
this.shadowRoot.innerHTML = `
<style>
:host { font-family: Inter, sans-serif; display: block; }
[part="name"] { font-weight: 700; }
::slotted(span) { color: #5a7bff; }
</style>
<strong part="name">${this.getAttribute('name')}</strong>
<slot name="role"></slot>`;
// slotchange fires on a microtask AFTER this synchronous block.
this.shadowRoot.querySelector('slot')
.addEventListener('slotchange', () => this.dataset.ready = 'true');
}
}
customElements.define('user-card', UserCard);
The screenshot can land at any of three pre-settled moments: before define() upgrades the placeholder element, after upgrade but before slotchange distributes the slotted role text, or after layout but before the Inter font swaps in over the fallback sans-serif. Each produces a baseline-divergent pixel buffer.
Root-cause analysis
Three independent asynchronous timelines converge on the rendered output, and the naive test waits for none of them.
Custom-element upgrade is asynchronous. Per the WHATWG HTML Standard, when the parser encounters <user-card> before its definition is registered, it creates a plain HTMLElement and enqueues an upgrade reaction. The upgrade — running the constructor and connectedCallback — happens later, as a custom-element-reaction microtask, once customElements.define() executes. Because the defining module is loaded after the element in markup, there is a real window where the element exists but has no shadow root and no styles.
Slot distribution and slotchange are microtask-deferred. Assigning nodes to a slot schedules slotchange to fire at the end of the current microtask checkpoint, not synchronously. A capture taken in the synchronous tail of connectedCallback sees the slot empty or unstyled.
Font swap is a separate, slower timeline. With font-display: swap, the browser paints fallback glyphs immediately and repaints with the web font once it loads. Inter and the fallback have different metrics, so text reflows by sub-pixel amounts when the swap lands — long after upgrade and slotchange.
The naive toHaveScreenshot() resolves as soon as Playwright can grab a stable-ish frame, with no knowledge of any of these component-specific signals. Determinism requires explicitly awaiting each one.
There is a fourth, subtler contributor: layout reflow triggered by the font swap is not the only late paint. A component that lazily renders into its shadow root on requestAnimationFrame, or that measures itself with ResizeObserver and adjusts on the next frame, introduces additional frames where the rendered output is still converging. Any of these self-scheduled updates can land between Playwright deciding the page is “stable” and the actual final paint, which is why the fix below waits for a committed paint rather than trusting the runner’s internal stability heuristic alone.
The production-safe fix
Gate the capture on every settling signal, freeze anything still in motion, and mask regions that are legitimately dynamic.
import { test, expect } from '@playwright/test';
test('user-card visual', async ({ page }) => {
await page.goto('/components/user-card.html');
// 1. Wait for the element definition to register and the placeholder to upgrade.
await page.evaluate(() => customElements.whenDefined('user-card'));
// 2. Wait for slot distribution to settle (component sets data-ready on slotchange).
await page.locator('user-card[data-ready="true"]').waitFor();
// 3. Wait for all @font-face fonts to finish loading and swapping.
await page.evaluate(() => document.fonts.ready);
// 4. Wait for a settled layout frame so reflow from the swap has flushed.
await page.evaluate(() => new Promise((r) =>
requestAnimationFrame(() => requestAnimationFrame(r))));
const card = page.locator('user-card');
await expect(card).toHaveScreenshot('user-card.png', {
animations: 'disabled', // freeze CSS animations/transitions
caret: 'hide', // remove blinking text caret
maxDiffPixelRatio: 0.005, // tolerate sub-pixel AA noise across engines
mask: [card.locator('[part="timestamp"]')], // blank genuinely dynamic regions
});
});
The double requestAnimationFrame is deliberate: the first callback runs before the next paint, the second guarantees that paint has been committed, so any reflow triggered by the font swap has flushed before capture. animations: 'disabled' injects a stylesheet that zeroes animation and transition durations, eliminating the largest source of frame-to-frame variance. The mask blanks regions — relative timestamps, random avatars — that should never participate in the diff. The pixel-ratio threshold, not an absolute pixel count, absorbs the unavoidable anti-aliasing differences between Chromium, WebKit, and Firefox text rasterizers.
The data-ready flag deserves a note: the component sets it on slotchange, which makes distribution observable from outside the shadow boundary. If you cannot modify the component to expose such a signal, wait on a concrete rendered consequence instead — for example, await page.locator('user-card >> [part="name"]').waitFor() to confirm the shadow content has been written, combined with page.waitForFunction(() => el.shadowRoot?.querySelector('slot').assignedNodes().length > 0) to confirm slot distribution. Avoid fixed page.waitForTimeout() sleeps entirely: a fixed delay is either too short on a slow CI runner (reintroducing flake) or wastefully long on a fast one, and it never actually observes the signal you care about.
For components that ship their own fonts via constructable stylesheets rather than a document-level <link>, document.fonts.ready still resolves correctly because FontFaceSet is a document-level registry that both @font-face and the FontFace constructor populate. But a font loaded inside a shadow root’s adoptedStyleSheets via @font-face is a common mistake — @font-face is only honored in the document or in a shadow root’s own stylesheet for that root’s rendering, and the load still registers on document.fonts, so the fonts.ready gate remains valid. Verify this assumption with the document.fonts.status check shown in Verification rather than assuming it.
Verification
Confirm the fix two ways. First, prove the gate actually observes each signal by logging the page state at capture time:
const state = await page.evaluate(() => ({
defined: !!customElements.get('user-card'),
ready: document.querySelector('user-card')?.dataset.ready === 'true',
fonts: document.fonts.status, // "loaded"
}));
console.log(state); // { defined: true, ready: true, fonts: 'loaded' }
Second, prove the test is no longer flaky by repeating it under load:
npx playwright test user-card.visual --repeat-each=50 --workers=4
# 150 passed (50 runs x 3 engines), 0 flaky
A run that previously surfaced one or two flakes per fifty now passes deterministically. If a real visual change is introduced, --update-snapshots regenerates a baseline that the next un-updated run must match exactly within the threshold.
When to use, when to avoid
| Situation | Approach |
|---|---|
| Static, themeable component surface (cards, buttons, badges) | Full visual snapshot with the four-signal gate; per-engine baselines. |
| Component with intrinsically dynamic regions (timestamps, live data) | Snapshot with those regions in mask; assert the dynamic value separately as a contract test. |
Component behind a closed shadow root |
Snapshot the host’s painted output only; you cannot reach inner parts — verify behavior through events instead. |
| Logic-only change with no visual surface | Skip visual entirely; use the contract test on event.detail instead. |
| Heavy animation / canvas / video | Avoid pixel snapshots; freeze to a known frame or assert computed style and DOM state. |
| Server-rendered (declarative Shadow DOM) component | Capture only after hydration adopts the root; coordinate with SSR & Hydration. |
Visual regression is a sharp tool: it catches unintended pixel drift that no assertion would, but it is worthless if flaky, because a team learns to re-run until green and stops trusting the gate. The discipline above — wait for every async signal, freeze motion, mask the genuinely dynamic, threshold for engine AA noise — is what makes the gate trustworthy enough to block a publish.
Related
- Contract & Visual Testing — the parent topic covering the full real-browser test gate.
- Distribution, Testing & Tooling — the section governing packaging, testing, and publishing.
- Contract Testing Custom Event Payloads — the non-visual half of the same gate, asserting
event.detail. - Event Composition & Bubbling — why a settled component still emits events that a wrapper can observe.