Storybook Interaction Tests

Storybook interaction tests turn the stories you already write for documentation into executable behavioural specifications. Instead of maintaining a separate harness that re-mounts components, you attach a play function to a story, script the user journey with userEvent, and assert against the rendered DOM — all inside the same browser-accurate iframe that powers the Storybook UI. This approach belongs to the broader discipline of Component & Integration Testing: it sits between isolated unit assertions and full end-to-end flows, exercising a component with realistic events while keeping the boundary tight and the feedback loop fast. The material here targets frontend engineers and QA specialists running Storybook 8 with React 18/19, and it covers the full lifecycle: authoring play functions with @storybook/test, debugging them interactively in the Storybook UI, and executing the entire catalogue headlessly with the @storybook/test-runner so the same stories gate your pipeline.

Architectural Scope & Boundaries

Interaction testing in Storybook applies to a precise tier of the test suite, and treating it as a replacement for either unit tests or end-to-end coverage produces brittle, slow, or redundant suites. The technique mounts a single component (or a small composed tree) inside Storybook’s preview iframe, then drives it through a play function that simulates clicks, typing, and keyboard navigation before asserting on observable output.

The architectural payoff is that a story stops being a static fixture and becomes a behavioural contract. In most teams, stories already exist: they document the states a component can be in, they back visual review, and they are the artefact a designer or product owner looks at. Attaching a play function to those existing stories means the same artefact that demonstrates a behaviour also proves it, so documentation and verification can no longer drift apart. When the component changes in a way that breaks the documented behaviour, the story turns red — you cannot ship a misleading example, because a misleading example is a failing test. That convergence is the reason interaction tests have become a default layer in modern component workflows rather than a niche add-on.

What it covers well:

Component behaviour in response to real user events — focus management, form validation, disclosure widgets, conditional rendering.
Visual states that are awkward to reach in a unit renderer, because the story already encodes the props and decorators needed to get there.
Accessibility-adjacent assertions, since userEvent dispatches events the way a browser does and queries resolve against the accessibility tree.

What it explicitly does not cover:

Cross-page navigation, real routing, or multi-route flows — those stay in your end-to-end layer.
Real network round-trips. Network is mocked at the story boundary, the same way you would when mocking network in Playwright component tests.
Pure logic with no DOM — a reducer or a date formatter belongs in a plain Vitest unit test, not a story.

The isolation tier is integration-leaning component testing: more than a unit test because it renders through the real component pipeline and decorators, less than end-to-end because it never leaves the iframe. Aligning these boundaries with your overall Testing Library best practices keeps queries user-centric and prevents the suite from drifting into implementation-detail assertions.

A useful way to decide whether a behaviour belongs in a story is to ask what evidence would convince a reviewer the component works. If the answer is “I clicked this, typed that, and the right thing appeared,” it is a play function. If the answer is “given these inputs the function returns this value,” it is a unit test. If the answer is “the user logged in, navigated three pages, and completed checkout,” it is end-to-end. Interaction tests are most valuable precisely where unit tests are awkward and end-to-end tests are wasteful — the disclosure widget that toggles on a keypress, the form that validates on blur, the menu that traps focus. Because the story already encodes the exact props and decorators needed to render that state, you avoid the setup tax that makes those same scenarios painful in a bare unit renderer, and you avoid the cost and flakiness of booting a whole application to reach them.

One more boundary deserves emphasis: a story should remain a single, comprehensible scenario. When a play function grows past a dozen interactions or starts branching on conditionals, that is a signal the story is trying to be an end-to-end test. Split it into focused stories — one per meaningful state — so each failure points at a specific behaviour rather than forcing you to bisect a long script.

Prerequisites

Before writing your first interaction test, confirm the toolchain is on a supported baseline. The play-function API and the unified @storybook/test package both landed as the default in Storybook 8, so older majors require different imports.

Storybook 8.x installed and rendering your component framework (React 18/19, Vue 3, or Svelte).
The @storybook/test package available — it bundles userEvent, expect, fn, and within so you do not import them piecemeal.
The interactions addon registered (it ships inside @storybook/addon-essentials in Storybook 8).
Component Story Format 3 (CSF3) stories using the object syntax with a meta default export.
For CI execution: @storybook/test-runner and a Playwright browser binary installed via npx playwright install.
A static or running Storybook the runner can target (storybook build output, or storybook dev on a known port).

Step-by-Step Implementation

The following steps build a complete interaction test from an empty story to a CI-gated check. Each step produces runnable code.

Step 1: Define a CSF3 story with typed args

Start from a plain story. The meta object pins the component and its default args; the named export is the story the play function will attach to.

// LoginForm.stories.tsx
import type { Meta, StoryObj } from '@storybook/react';
import { fn } from '@storybook/test';
import { LoginForm } from './LoginForm';

const meta: Meta<typeof LoginForm> = {
  title: 'Auth/LoginForm',
  component: LoginForm,
  args: { onSubmit: fn() }, // spy injected via args — assertable later
};
export default meta;

type Story = StoryObj<typeof LoginForm>;

export const Empty: Story = {};

Step 2: Attach a play function that drives the component

The play function receives a canvasElement. Scope all queries to it with within so a story never accidentally asserts against the Storybook chrome.

// LoginForm.stories.tsx (continued)
import { within, userEvent, expect } from '@storybook/test';

export const SubmitsCredentials: Story = {
  play: async ({ canvasElement, args }) => {
    const canvas = within(canvasElement);
    await userEvent.type(canvas.getByLabelText('Email'), 'ada@example.com');
    await userEvent.type(canvas.getByLabelText('Password'), 'hunter2');
    await userEvent.click(canvas.getByRole('button', { name: /sign in/i }));

    await expect(args.onSubmit).toHaveBeenCalledWith({
      email: 'ada@example.com',
      password: 'hunter2',
    });
  },
};

Step 3: Assert error and edge states with await expect

Reach failure states the same way a user would. Because expect from @storybook/test is the Vitest-compatible matcher set, assertions read identically to your unit suite.

export const ShowsValidationError: Story = {
  play: async ({ canvasElement }) => {
    const canvas = within(canvasElement);
    await userEvent.click(canvas.getByRole('button', { name: /sign in/i }));
    await expect(
      await canvas.findByText('Email is required'),
    ).toBeVisible();
  },
};

Step 4: Run the catalogue headlessly with the test-runner

The runner spins up a Playwright browser, visits every story, executes its play function, and fails on any thrown assertion. This is the same mechanism detailed in running Storybook tests in CI with the test-runner.

# Build once, then point the runner at the static output
npx storybook build
npx http-server storybook-static --port 6006 --silent &
npx test-storybook --url http://127.0.0.1:6006

Step 5: Gate the pipeline on a clean run

Wire the runner into CI so a failing interaction blocks the merge, exactly like any other Testing Library suite. Keep the static-build path for determinism; never test against a hot-reloading dev server in CI.

# .github/workflows/storybook-tests.yml (excerpt)
- run: npx playwright install --with-deps chromium
- run: npx storybook build --quiet
- run: npx concurrently -k -s first \
    "npx http-server storybook-static --port 6006 --silent" \
    "npx wait-on tcp:6006 && npx test-storybook --url http://127.0.0.1:6006"

Configuration Reference Table

These are the knobs you reach for most often when scripting and running interaction tests.

Option / API	Where it lives	Type	Default	Effect
`play`	story object	`(ctx) => Promise<void>`	none	Async script that runs after the story mounts; throwing fails the test.
`within(canvasElement)`	`@storybook/test`	function	—	Scopes Testing Library queries to the story canvas, excluding Storybook UI.
`userEvent`	`@storybook/test`	object	—	Browser-accurate event simulation (type, click, keyboard, tab).
`fn()`	`@storybook/test`	function	—	Creates a spy for `args`, assertable with `toHaveBeenCalled`.
`expect`	`@storybook/test`	function	—	Vitest-compatible matchers, including DOM matchers like `toBeVisible`.
`--url`	`test-storybook`	string	`localhost:6006`	Target Storybook instance the runner visits.
`--maxWorkers`	`test-storybook`	number	CPU count	Parallel Playwright workers; lower for constrained CI runners.
`--shard`	`test-storybook`	`n/total`	none	Splits stories across machines for horizontal scaling.
`--coverage`	`test-storybook`	flag	off	Collects per-story coverage via instrumented sources.

Verification & Assertions

A passing interaction test is one where every play function completes without throwing. In the Storybook UI, the Interactions panel renders each step as a timeline you can step through, pause, and rewind — invaluable when a userEvent call resolves before the component finishes updating. When you run the suite from the terminal, the runner prints a per-story result and a summary:

 PASS   Auth/LoginForm SubmitsCredentials
 PASS   Auth/LoginForm ShowsValidationError
 Test Suites: 1 passed, 1 total
 Tests:       2 passed, 2 total

Prefer findBy* queries for anything that appears after an async update; they retry until the element resolves or the timeout fires, which removes the most common source of false negatives. Reserve getBy* for elements present at mount. Asserting through the accessibility tree — getByRole, getByLabelText — keeps the test resilient to markup churn and doubles as a lightweight accessibility check.

Beyond the green-or-red result, the Interactions panel is itself a verification tool. Each step is recorded with the element it acted on and the matcher it evaluated, so a failure shows you not just that an assertion broke but the precise DOM state at the moment it broke. Use the step controls to rewind to the interaction before the failure and inspect the live canvas; this collapses the usual debug cycle of adding log statements and re-running into a single replay. When you need a snapshot of the rendered tree at any point, call canvas.debug() inside the play function to print the current DOM to the console.

A second verification habit worth adopting is asserting the negative as well as the positive. A login story should confirm the error message appears on bad input and that it is absent on good input; a disclosure story should confirm content is hidden initially, not only that it becomes visible. Pairing findBy* for appearance with queryBy* returning null for absence catches a whole class of regressions where a component renders too much rather than too little.

Edge Cases & Failure Modes

Queries match the Storybook toolbar, not your component. If you call screen.getByRole instead of scoping to canvasElement, queries can resolve against Storybook’s own chrome. Always wrap with within(canvasElement).

Assertions fire before the DOM settles. A click that triggers an async state update will fail a synchronous getByText. Switch to await canvas.findByText(...), which polls until the node appears, mirroring the retry behaviour you would use to avoid flaky tests.

Spies leak between stories. When you reuse a fn() across stories without resetting it, call counts accumulate. Declare the spy in meta.args so each story render gets a fresh instance, or reset it at the start of the play function.

Network calls hit the real backend. Unmocked fetches make the suite nondeterministic. Mock at the story level with a loader or decorator before the play function runs, consistent with how you isolate services in Playwright component testing.

Performance & CI Impact

Interaction tests run in a real browser, so they are heavier than jsdom unit tests but far cheaper than full end-to-end journeys — each story mounts a single component tree rather than booting an application. The dominant cost in CI is browser startup, which the test-runner amortises by reusing one Playwright instance across all stories and parallelising with --maxWorkers. For large catalogues, --shard n/total distributes stories across runners and turns a linear suite into a roughly constant-time one.

Put numbers to it to set expectations. A typical play function that types into a field and clicks a button completes in a few hundred milliseconds once the browser is warm — far slower than a jsdom unit test measured in single-digit milliseconds, but two orders of magnitude faster than an end-to-end run that navigates a real application across several pages. The per-story cost is dominated by rendering and event dispatch, not browser launch, because the runner shares one browser process. This is why the static-build path matters so much for CI economics: compiling Storybook once and serving immutable output removes recompilation from the hot path, so the only variable cost per story is the work the play function actually does.

To keep the lane fast and trustworthy: build Storybook once and serve the static output rather than testing against a dev server; cache the Playwright browser binary between runs; and treat any newly flaky story as a defect to fix immediately rather than retry blindly. These practices line up with the wider test pyramid strategy — interaction tests are a thin, high-value band, not a dumping ground for everything you could not be bothered to unit test.

There is a real risk worth naming: because writing a story is cheap and the play function feels like a free assertion, teams can over-invest, accumulating hundreds of browser-driven stories that each take a few hundred milliseconds and collectively dominate the pipeline. Guard against this by keeping the band deliberately thin. Every story that earns a play function should test a behaviour that genuinely benefits from a real browser — focus, keyboard, async rendering, decorator composition. Anything that can be proven with a pure assertion stays in the unit layer, where it runs in milliseconds without a browser.

Caching deserves a specific note. The two expensive artefacts are the Playwright browser binary and the compiled Storybook itself. Cache the browser keyed on your lockfile so it only re-downloads when dependencies change, and consider caching the storybook-static build keyed on the source hash so unchanged stories skip recompilation entirely. Combined with sharding, this keeps wall-clock time roughly flat as the catalogue grows, which is the difference between a check developers trust and one they learn to ignore.

In-Depth Guides

Writing play functions for Storybook interaction tests — a deep walkthrough of userEvent, expect, args, and spies for scripting realistic component journeys.
Running Storybook tests in CI with the test-runner — headless Playwright execution, sharding, coverage, and a reproducible pipeline configuration.

Back to Component & Integration Testing
Testing Library best practices — the query philosophy your play functions should follow
Playwright component testing — a complementary browser-based component layer
Flaky test mitigation — keeping interaction suites deterministic