Contract & Visual Testing

A web component is a runtime contract — attributes, properties, slots, ::part surfaces, and emitted events — and the only way to prove that contract holds is to render the component in a real browser engine and inspect what it actually produces. Tests that mock the DOM cannot see a shadow root, cannot measure a painted pixel, and cannot fire a composed event through a real event path, so they validate a fiction rather than the component your consumers ship.

This topic area sits under Distribution, Testing & Tooling, the parent section that governs everything between “works on my laptop” and “works in a thousand apps I will never see.” The gate described here is the one that runs on every commit: it asserts the shape of the data a component emits, the surface it exposes for styling and composition, and the pixels it paints — across Chromium, WebKit, and Firefox.

Concept and spec grounding

A “contract” here is the externally observable behavior other code relies on, defined by several normative specifications working together. Custom element reactions and the element’s attribute/property duality come from the WHATWG HTML Standard’s custom elements section. Shadow DOM scoping, slot assignment, and the ::part and ::slotted pseudo-elements come from the DOM Standard and CSS Shadow Parts. Events — including the composed and bubbles flags that determine whether a CustomEvent escapes a shadow boundary — are governed by the DOM Standard’s event dispatch algorithm. Accessibility expectations trace back to the ARIA specification and the platform’s accessibility tree.

None of these can be exercised meaningfully outside a browser engine. jsdom and similar Node-side DOM emulations implement enough of the DOM to satisfy unit-test imports, but they do not run a layout engine, do not compute styles, do not paint, and have historically had partial-to-absent Shadow DOM and adoptedStyleSheets support. A snapshot taken in jsdom proves nothing about what a user sees, and an event assertion there proves nothing about real propagation through nested shadow roots. The architectural decision that defines this topic area is therefore simple: tests run in real engines, driven either by Playwright or by @web/test-runner over the WebDriver BiDi / CDP protocols.

Browser engine integration points

Both runners drive a real engine, but they integrate at different points in the page lifecycle, and that timing matters for component tests.

@web/test-runner (WTR) loads your test file as a module inside the page. Your assertions execute in the same realm as the component, so you call customElements.define, await customElements.whenDefined, read shadowRoot, and inspect assignedNodes() directly. It is the closest thing to a true in-browser unit test and is ideal for contract and surface assertions where you need synchronous access to the live element instance.

Playwright drives the page from the outside, over CDP (Chromium), the WebKit remote protocol, or Firefox’s Juggler/BiDi. Your test code runs in Node; you reach into the page with page.evaluate() to run code in the page realm, and you use page.locator() plus toHaveScreenshot() for visual capture. Playwright’s killer feature for Shadow DOM is that its locator engine pierces open shadow roots automatically — a CSS or text selector resolves through shadow boundaries without manual shadowRoot traversal. (Closed shadow roots remain opaque by design; see Open vs Closed Shadow DOM Tradeoffs.)

The critical timing fact for both: a custom element upgrade is asynchronous relative to parse. When the HTML parser encounters <my-widget> before its definition is registered, it creates an HTMLElement placeholder and upgrades it later as a microtask once define() runs. Slot assignment fires slotchange on a microtask after distribution. Fonts load and swap on their own timeline. Every assertion that reads the rendered surface must be sequenced after these settle, or the test reads a half-built element. This is the root of most flake and is covered in depth in Visual Regression Testing of Shadow DOM.

Core assertion surface

A complete component test covers four distinct surfaces. The table below maps each to the API you assert against and the runner best suited to it.

Surface	What you assert	Primary API	Best runner
Event contract	`event.detail` shape, `bubbles`, `composed`, `cancelable`	`addEventListener` + JSON Schema (Ajv)	WTR or Playwright `page.evaluate`
Attribute / property sync	reflected attributes, property getters/setters, `observedAttributes`	`getAttribute` / direct property read	WTR
Slots	`slot.assignedNodes()`, `assignedElements()`, fallback content	`HTMLSlotElement` API	WTR
`::part` surface	named parts are present and externally styleable	`el.shadowRoot.querySelectorAll('[part]')`	WTR + Playwright (computed style)
Visual	painted pixels of the shadow tree	`toHaveScreenshot()`	Playwright
Accessibility	computed role/name, contrast, ARIA validity	`axe-core` injected into the page	Playwright

For the event contract specifically, treat the detail payload as a versioned schema. Validate it with Ajv so the assertion is the schema itself rather than a hand-written pile of expect calls — the full pattern lives in Contract Testing Custom Event Payloads.

// @web/test-runner test module — runs in the page realm
import { fixture, html, expect } from '@open-wc/testing';
import Ajv from 'ajv';
import '../dist/star-rating.js';

const ajv = new Ajv({ allErrors: true });
const validateRated = ajv.compile({
  type: 'object',
  required: ['value', 'max'],
  additionalProperties: false,
  properties: {
    value: { type: 'integer', minimum: 0, maximum: 5 },
    max: { type: 'integer', const: 5 },
  },
});

it('exposes a stable rated-event contract', async () => {
  const el = await fixture(html`<star-rating value="3"></star-rating>`);
  await customElements.whenDefined('star-rating');

  // 1. Property/attribute sync surface
  expect(el.value).to.equal(3);
  el.value = 4;
  expect(el.getAttribute('value')).to.equal('4');

  // 2. ::part surface is present for external styling
  const stars = el.shadowRoot.querySelectorAll('[part="star"]');
  expect(stars.length).to.equal(5);

  // 3. Slot surface
  const label = el.shadowRoot.querySelector('slot[name="label"]');
  expect(label.assignedNodes().length).to.be.greaterThan(0);

  // 4. Event contract surface
  const event = await oneEvent(el, 'rated', () => stars[2].click());
  expect(event.bubbles).to.be.true;
  expect(event.composed).to.be.true;
  expect(validateRated(event.detail), JSON.stringify(validateRated.errors)).to.be.true;
});

function oneEvent(target, type, trigger) {
  return new Promise((resolve) => {
    target.addEventListener(type, resolve, { once: true });
    trigger();
  });
}

The visual and accessibility surfaces are best driven by Playwright because it captures actual paint and can inject axe-core into the live page. The following test renders a component, waits for it to be fully settled, snapshots the shadow tree, and runs an axe audit that pierces the shadow root.

// playwright.config.js
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  expect: { toHaveScreenshot: { maxDiffPixelRatio: 0.01, animations: 'disabled' } },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'webkit', use: { ...devices['Desktop Safari'] } },
    { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
  ],
});

// star-rating.visual.spec.js
import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';

test.describe('star-rating render contract', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('/components/star-rating.html');
    // Defer capture until upgrade, layout, and fonts have all settled.
    await page.evaluate(() => customElements.whenDefined('star-rating'));
    await page.evaluate(() => document.fonts.ready);
    await page.locator('star-rating [part="star"]').first().waitFor();
  });

  test('matches the visual baseline', async ({ page }) => {
    const widget = page.locator('star-rating');
    // Playwright's locator pierces the open shadow root automatically.
    await expect(widget).toHaveScreenshot('star-rating.png');
  });

  test('has no accessibility violations through the shadow root', async ({ page }) => {
    // axe-core walks composed nodes, so shadow content is audited too.
    const results = await new AxeBuilder({ page })
      .include('star-rating')
      .withTags(['wcag2a', 'wcag2aa'])
      .analyze();
    expect(results.violations).toEqual([]);
  });
});

The @axe-core/playwright integration injects axe-core into the page and runs it against the composed tree, so a missing accessible name on a button inside the shadow root is reported exactly as it would be for light-DOM markup. This is the single most valuable test for a design-system primitive, because keyboard and screen-reader semantics are precisely what unit tests never catch. axe-core traverses the flattened tree — the composed view that the accessibility engine itself exposes — so slotted light-DOM content is audited in the position where it actually renders, and a contrast failure between a slotted label and a shadow-root background surfaces even though the two nodes live in different trees. Run the audit with the WCAG tag set your product commits to (wcag2a, wcag2aa, optionally wcag21aa) rather than the default rule set, so the gate matches the conformance level you publish.

A second production concern is baseline storage. Each engine rasterizes text and gradients differently, so a single shared baseline image will always diff against at least one engine. Commit a per-engine baseline — Playwright names snapshots with the project suffix automatically (star-rating-chromium.png, -webkit.png, -firefox.png), and you review three images on the first run. Store baselines in the repository, not in a flaky external service, and review baseline changes in the pull request exactly as you review code: an unexpected baseline diff is a design regression and a reviewer should see it as a rendered before/after, which Playwright’s HTML report provides.

Failure modes and debugging

1. Flaky screenshots from capturing mid-upgrade. The most common failure is a baseline that passes locally and fails in CI with a one-row diff, because the snapshot was taken before the element upgraded, before slotchange distributed content, or before the web font swapped. Root cause: custom-element upgrade, slot distribution, and font swap are all asynchronous. Fix: gate every capture on customElements.whenDefined(), document.fonts.ready, and a settled layout frame, and disable animations. The full timeline analysis is in Visual Regression Testing of Shadow DOM.

2. Silent event-contract drift. A component renames detail.rating to detail.value, every existing test still passes because nothing asserted the old key, and a downstream consumer’s handler reads undefined at runtime. Root cause: events are an untyped runtime contract; nothing enforces the payload shape unless you write it down. Fix: compile a JSON Schema with Ajv and assert event.detail against it, plus explicit bubbles/composed checks. See Contract Testing Custom Event Payloads.

3. Asserting against a closed shadow root. A test calls el.shadowRoot.querySelector(...) and gets null because the element attached its root with mode: 'closed'. Root cause: a closed root returns null from the shadowRoot getter by design, and Playwright locators cannot pierce it. Fix: test the component through its public surface (events, attributes, externally visible ::part rendering) only, or expose an internal test hook on a development build. Do not weaken production encapsulation for tests.

4. axe reports zero violations but the component is inaccessible. The audit passes because the inaccessible control lives behind a closed root or is rendered into a slot that the test fixture left empty. Root cause: axe audits the composed tree as it exists at capture time; missing slotted content or opaque roots hide nodes from it. Fix: populate every slot the component documents and audit an open-root development build, then assert specific computed roles with getComputedAccessibleNode-style checks for closed-root components.

5. Baseline thrash between local and CI runs. Screenshots regenerated on a developer’s macOS machine fail in a Linux CI runner with whole-glyph diffs. Root cause: font rendering, sub-pixel hinting, and available system fonts differ between operating systems, so a baseline captured on one OS is not portable to another. Fix: generate and store baselines from the same container image CI uses — run npx playwright test --update-snapshots inside the CI Docker image (or via --update-snapshots in a dedicated CI job), never from a local OS, so the committed baseline matches the environment that will compare against it.

Framework interop

The component under test is framework-agnostic, but consumers wrap it, and the wrapping changes how events and properties cross the boundary — so the test suite should include at least one integration check per target framework.

React (≤18): React attaches listeners at the root and does not natively bind to custom events; it also serializes non-primitive props to attributes. A rated event and an object-valued property both need an explicit ref-based bridge, which the test should exercise. See Bridging Custom Events to React. React 19 adds first-class custom-element property and event support, so version the test accordingly.
Vue 3: Binds DOM properties and listens to custom events out of the box once the tag is treated as a custom element via compilerOptions.isCustomElement. Slot mapping is the surface most worth a dedicated test — see Mapping Named Slots in Vue.
Angular: Requires CUSTOM_ELEMENTS_SCHEMA; event and content projection behavior is covered in Projecting Angular Content into Web Components.
SSR: When the component is server-rendered via declarative Shadow DOM, the visual baseline must be captured after hydration adopts the root, or the snapshot reflects pre-hydration markup. Coordinate with Server-Side Rendering & Hydration.

The events themselves cross shadow boundaries according to their composed flag, the mechanics of which are detailed in Event Composition & Bubbling — a contract test that asserts composed: true is asserting exactly the property that lets a React or Vue listener on an ancestor ever receive the event at all.

Performance and memory

Browser-driven tests are an order of magnitude slower than jsdom unit tests, so the budget discipline matters. Run the three-engine matrix in parallel via the runner’s worker pool; Playwright shards across projects natively and @web/test-runner parallelizes across browser launchers. Reserve the full Chromium/WebKit/Firefox matrix for the visual and a11y layers, and run the fast contract/surface assertions on a single engine for pre-merge feedback, promoting to the full matrix on the release branch.

Watch for leaks in the test harness itself: a fixture that is not torn down leaves the element connected, its observers live, and its adoptedStyleSheets retained across tests, inflating memory until the worker is recycled. Use AbortController-based listener teardown inside components so a disconnected element releases its listeners deterministically, and rely on the test framework’s per-test fixture cleanup. For screenshot tests, cap maxDiffPixelRatio rather than maxDiffPixels so the threshold scales with viewport and device-pixel-ratio differences between engines.

Browser compatibility

The features the tests exercise and the protocols the runners use both have version floors worth tracking.

Capability	Chrome / Edge	Firefox	Safari
Custom elements + Shadow DOM	67+	63+	10.1+
`::part()`	73+	72+	13.1+
Constructable / `adoptedStyleSheets`	73+	101+	16.4+
Declarative Shadow DOM (`shadowrootmode`)	90+	123+	16.4+
WebDriver BiDi (runner transport)	124+	124+	preview

Playwright bundles pinned builds of Chromium, a WebKit build (the engine behind Safari), and Firefox, so the matrix does not depend on what is installed on the CI host — but those builds track current engine behavior, not the trailing-edge versions in the table. If your component supports an older floor, pair the real-engine matrix with the feature-detection and fallback strategy from Polyfills & Progressive Enhancement, and add an explicit test that asserts the degraded experience. WebKit’s headless build is the most likely to expose font-rendering and sub-pixel differences from Chromium baselines, which is why per-engine baseline images, not a single shared baseline, are the production norm.

Distribution, Testing & Tooling — the parent section covering packaging, testing, and the full publish pipeline.
Visual Regression Testing of Shadow DOM — eliminating screenshot flake from async upgrade, slotchange, and font swap.
Contract Testing Custom Event Payloads — JSON Schema assertions on event.detail in a real browser.
Event Composition & Bubbling — the composed/bubbles semantics a contract test must assert.
Open vs Closed Shadow DOM Tradeoffs — why a test can or cannot reach into a shadow root.

Related in this section