Running Storybook Tests in CI with the Test-Runner
The @storybook/test-runner turns every story in your catalogue into an automated browser test: it visits each one in headless Playwright, executes its play function, and fails the run on any thrown assertion or render error. Locally that is convenient; in CI it is the gate that stops a broken component from merging. This guide is for QA engineers and platform teams wiring Storybook 8 interaction tests into a pipeline. It covers the headless Playwright setup, building and serving a static Storybook for determinism, sharding across runners, collecting coverage, and the caching that keeps the lane fast. If you are still authoring the stories themselves, start with writing play functions for Storybook interaction tests; this page assumes those exist and focuses on running them at scale.
Root Cause Analysis
Storybook test jobs fail in CI for reasons that rarely appear on a developer laptop, and understanding the causes prevents a slow drip of red builds.
The first is testing against a dev server. storybook dev hot-reloads and recompiles on demand; under CI load it can serve a half-built story, producing intermittent failures that look like flaky tests but are really a build race. The fix is to test the static output of storybook build, which is immutable for the duration of the run.
The second is missing browser dependencies. The runner drives Playwright, which needs both the browser binary and a set of system libraries. A bare CI image has neither, so the job fails at launch with a cryptic shared-library error. Running playwright install --with-deps resolves both in one step.
The third is a runner that races the server. If the test-runner starts before the static Storybook is being served, every navigation 404s. The job must wait for the port to accept connections before it begins — the same ordering discipline that keeps any browser suite, including Playwright component tests, deterministic.
It is worth being precise about what the runner does under the hood, because the failure modes follow directly from its architecture. The runner launches a headless Playwright browser, reads your built index.json to enumerate every story, and for each one navigates the browser to that story’s URL in the preview iframe. Once the story has rendered, it executes the story’s play function inside the page and listens for thrown errors. There is no separate test file and no test framework in the traditional sense — the stories are the tests, and the play functions are the assertions. This is why a missing browser, an unreachable server, or a half-built static bundle each manifests as a wholesale failure rather than a single red test: the runner cannot even reach the point of executing assertions. Diagnosing CI failures therefore starts with the infrastructure layer (browser, server, ordering) before you ever look at the story logic itself.
Reproducible Setup
Install the runner and a Playwright browser. The runner is framework-agnostic; it only needs a Storybook to point at.
npm install @storybook/test-runner --save-dev
npx playwright install --with-deps chromium
Add scripts that separate the build, the serve, and the test phases so each can be cached or parallelised independently.
// package.json (excerpt)
{
"scripts": {
"build-storybook": "storybook build --quiet",
"serve-storybook": "http-server storybook-static --port 6006 --silent",
"test-storybook": "test-storybook --url http://127.0.0.1:6006"
}
}
Optionally add a .storybook/test-runner.ts to hook into each test — useful for global setup, accessibility checks, or per-story timeouts.
// .storybook/test-runner.ts
import type { TestRunnerConfig } from '@storybook/test-runner';
const config: TestRunnerConfig = {
async preVisit(page) {
page.setDefaultTimeout(15_000);
},
};
export default config;
Implementation
Assemble the pipeline in ordered, copy-pasteable steps.
1. Build a static Storybook. This compiles every story once into storybook-static, removing the dev-server race entirely.
npm run build-storybook
2. Serve the static build and wait for the port. Use concurrently to run the server and the runner together, and wait-on to block until the port is live.
npx concurrently -k -s first \
"npm run serve-storybook" \
"npx wait-on tcp:127.0.0.1:6006 && npm run test-storybook"
3. Shard across machines. For large catalogues, split stories across parallel jobs. Each job runs a slice and the results are aggregated by the CI provider.
# Job 1 of 3
npx test-storybook --url http://127.0.0.1:6006 --shard 1/3
# Job 2 of 3
npx test-storybook --url http://127.0.0.1:6006 --shard 2/3
4. Collect coverage. The --coverage flag instruments sources and writes a report you can merge with your unit coverage to track an overall threshold, consistent with a deliberate test pyramid strategy.
npx test-storybook --url http://127.0.0.1:6006 --coverage
5. Wire it into GitHub Actions. The full job caches the Playwright browser and runs the build, serve, and test phases in sequence.
# .github/workflows/storybook-ci.yml
name: Storybook Tests
on: [pull_request]
jobs:
interaction-tests:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '22', cache: 'npm' }
- run: npm ci
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: pw-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
- run: npx playwright install --with-deps chromium
- run: npm run build-storybook
- name: Test stories (shard ${{ matrix.shard }}/3)
run: |
npx concurrently -k -s first \
"npm run serve-storybook" \
"npx wait-on tcp:127.0.0.1:6006 && npx test-storybook --url http://127.0.0.1:6006 --shard ${{ matrix.shard }}/3"
Verification
A healthy CI run reports each story’s pass/fail and exits non-zero on any failure, blocking the merge. The summary mirrors a Jest-style report:
PASS Auth/LoginForm SubmitsCredentials (412 ms)
PASS Marketing/NewsletterForm ShowsConfirmation (388 ms)
Test Suites: 12 passed, 12 total
Tests: 47 passed, 47 total
Time: 18.4 s
With sharding, each matrix job reports its slice; the workflow goes green only when all shards pass. If you enabled coverage, confirm a coverage/storybook report is produced and that the merged total meets your threshold. Treat a newly failing story as a real regression, not noise — and if a story fails only under parallel load, diagnose it with the same rigour you would apply to any flaky test rather than papering over it with retries.
Before trusting the gate, verify it actually fails when it should. A green pipeline is only meaningful if a broken component turns it red, so deliberately introduce a failing assertion in one story, push, and confirm the job exits non-zero and the merge is blocked. This one-time check guards against the most dangerous failure of all: a misconfigured runner that silently reports success regardless of story state — for example because it pointed at an empty --url, found zero stories, and exited cleanly. A runner that finds and executes the expected number of stories, and that visibly fails on a planted defect, is one you can rely on as a merge gate.
For local reproduction of a CI failure, run the exact same commands the workflow uses: build the static Storybook, serve it on the same port, and invoke the runner with the identical --shard argument. Because the static build is deterministic, a failure that appears in CI will reproduce locally far more reliably than anything that depends on a live dev server, which is the single biggest reason to standardise on the static-build path everywhere.
Troubleshooting
The runner exits with “browserType.launch: Executable doesn’t exist”. Playwright’s browser is not installed in the CI image. Run npx playwright install --with-deps chromium before the test step, and cache ~/.cache/ms-playwright so subsequent runs skip the download.
Every story 404s. The runner started before the static server was ready, or it is pointing at the wrong URL. Gate the runner behind wait-on tcp:127.0.0.1:6006 and confirm --url matches the served port exactly, including the host.
The job hangs and times out. You are serving with storybook dev instead of the static build, so the watcher never lets the process exit. Build with storybook build, serve storybook-static, and use concurrently -k -s first so the server is killed once the runner finishes.
FAQ
Should I run against a static build or a live dev server in CI?
Always the static build. storybook build produces immutable output that cannot recompile mid-run, eliminating the build races that make dev-server-based jobs intermittently fail. Serve storybook-static with any static file server, point the runner at it, and you get the same artefact on every machine. The dev server is fine for local authoring but is a source of nondeterminism in CI.
How do I speed up a large Storybook test job?
Combine three levers. Use --shard n/total across a CI matrix so stories run on multiple machines in parallel. Cache the Playwright browser binary between runs so you do not re-download it each time. And tune --maxWorkers to match the runner’s available CPUs so each machine saturates without thrashing. Together these turn a linear suite into a roughly constant-time one.
Can the test-runner collect coverage?
Yes. Pass --coverage and the runner instruments your sources as each story executes, then writes a coverage report you can merge with your unit-test coverage. This gives a single picture of what your component layer actually exercises, which feeds into the coverage thresholds you enforce as part of an overall testing strategy.
Does the test-runner replace my unit and end-to-end tests?
No. It occupies the component-interaction band: heavier than a jsdom unit test, lighter than a full end-to-end journey. Keep pure logic in unit tests and cross-page flows in end-to-end tests; use the runner to verify that the components captured in your stories behave correctly under real user input. Each layer earns its place in the pyramid.
Related
- Back to Storybook interaction tests
- Writing play functions for Storybook interaction tests — author the stories this pipeline runs
- Playwright component testing — the browser engine underneath the runner
- Flaky test mitigation — keeping a parallel CI suite deterministic