Component & Integration Testing
Between the pure-logic unit test and the full browser-driven end-to-end run sits the layer where most frontend defects actually live: the place where a component renders inside a real-ish DOM, reads from a state provider, fetches across a network boundary, and reconciles server markup with client interactivity. This is the layer modern JavaScript architectures stress hardest and test worst. Teams either under-cover it — pushing everything into brittle E2E suites that inflate pipeline cost and slow delivery — or they over-mock it until the tests validate their own fixtures instead of real behavior. The goal here is a disciplined middle tier: tests that exercise component contracts and cross-module data flow with deterministic execution, clear isolation boundaries, and a runner configuration that scales linearly across CI workers. This section maps the framework choices, isolation models, and pipeline tradeoffs that make that tier both fast and trustworthy.
Why This Layer Exists
A unit test proves a function returns the right value; an end-to-end test proves a user can complete a journey. Neither answers the question that dominates frontend bug reports: does this component behave correctly when wired to its real collaborators? A button that dispatches the wrong action, a form that desynchronizes from its store, a server-rendered list that throws a hydration warning on mount — these are integration defects, invisible to isolated unit tests and far too expensive to surface only through E2E.
The component and integration layer exists to close that gap economically. It renders real component trees in a virtualized or headless DOM, drives them through realistic user interactions, and verifies the observable outcome — without the cost of booting a full application against live services. Positioned correctly within a deliberate test pyramid strategy, this layer absorbs the bulk of behavioral verification that would otherwise pile onto slow, flaky E2E runs. It is where confidence is cheapest to buy per assertion, provided the boundaries are drawn with intent.
The architectural payoff is leverage. A well-structured component test catches a regression in milliseconds; the equivalent E2E run might take thirty seconds and need a retry. When this layer is healthy, the suite above it can shrink to a handful of critical-path smoke tests, and the suite below it stops trying to compensate for gaps it was never meant to fill.
There is also a feedback-loop argument that matters as much as the cost argument. Defects caught here surface during the inner development loop — while the engineer still has the change in working memory — rather than minutes later in a CI run or hours later in a flaky E2E job that someone has to triage. The further a defect travels from the keystroke that introduced it, the more expensive it becomes to diagnose, because the context that explains it has evaporated. A fast, trustworthy component and integration tier compresses that distance. It turns the question “why did the pipeline go red?” into “the test I just touched explains exactly what broke,” which is the difference between a suite engineers trust and one they learn to ignore.
Finally, this is the layer that ages best. Pure unit tests tend to ossify around internal function shapes and need rewriting whenever the implementation is refactored; E2E tests tend to rot against shifting selectors and environment drift. Tests pitched at the component contract — what the user sees and does — survive both refactors and infrastructure churn, because the thing they assert is the thing that is supposed to stay stable. That durability is what makes investment here compound rather than depreciate.
It is worth being concrete about the kinds of defect this layer is uniquely positioned to catch, because that determines what belongs here versus elsewhere. Consider a form whose submit button is meant to disable while a mutation is in flight: a unit test on the submit handler proves the function fires, but only a rendered test proves the button actually reaches the disabled state and re-enables on error. Consider a list that derives its empty state, loading state, and error state from a data hook: each of those branches is a render path, and each is invisible to a test that never mounts the component. Consider a context provider whose value change must re-render a deeply nested consumer: that propagation is precisely what an integration test verifies and what a unit test, by mocking the context, defines out of existence. These are not exotic cases; they are the daily texture of frontend work, and they are why a suite that skips this layer feels green while production keeps breaking.
Core Concepts & Taxonomy
Precise vocabulary prevents architectural drift. The terms below are used consistently throughout this section and its child topics.
Component test. Renders a single UI unit together with its immediate logic, mocking external dependencies — network calls, third-party SDKs, global stores — so the outcome is fully deterministic. It optimizes for speed and isolation. The question it answers: given these props and these interactions, does this component render and behave as specified?
Integration test. Deliberately crosses module boundaries. It mounts a component alongside its real state manager, router, and context providers, mocking only the network edge. It optimizes for behavioral correctness over raw speed. The question: do these units cooperate correctly when assembled?
Runner isolation model. Whether tests share a process (in-process: fast startup, risk of global-state leakage) or run in isolated workers (out-of-process: stronger determinism, higher memory cost). This single choice shapes flakiness, parallelism, and CI cost more than any other configuration knob.
DOM environment. The simulated browser the test runs against — jsdom/happy-dom for fast logical assertions, or a real Chromium context for layout, paint, and true browser APIs. Each trades fidelity against speed.
Hydration boundary. The moment a server-rendered DOM is adopted by client JavaScript. Mismatches here — divergent markup, non-deterministic timestamps, locale drift — are a distinct failure class that only surfaces when you test the server and client paths together.
Query strategy. How a test locates elements. User-facing queries (getByRole, getByLabelText) couple to accessible semantics; implementation queries (getByTestId, DOM traversal) couple to internals and break on refactors. The choice determines how brittle the suite becomes over time.
These concepts compose: a component test runs in a jsdom DOM environment under an in-process runner, using user-facing queries and mocking the network. An integration test of a server-rendered page might instead exercise the hydration boundary under an out-of-process worker. The taxonomy is what lets a team reason about each test’s cost and confidence before writing it.
Two distinctions deserve special emphasis because they are routinely conflated, and the conflation is the source of most disagreement about what to test where.
The first is server state versus client state. Modern applications split state into server-managed data — API responses, caches, mutations, revalidation — and client-managed data — UI toggles, form inputs, transient selection. An integration test should mock the network boundary but exercise the full synchronization pipeline that sits between them: cache population, optimistic updates, error fallbacks, and revalidation under simulated latency. The bugs that hurt in production almost always live in that synchronization layer, not in either store in isolation, which is precisely why a test that mocks both stores away learns nothing useful.
The second is deterministic versus probabilistic assertions. A deterministic assertion checks an exact, reproducible outcome — this text appears, this handler was called with these arguments. A probabilistic assertion tolerates a range — a value within a threshold, a render within a time budget. Architecture-first teams default to deterministic contracts and reach for probabilistic checks only for genuinely non-deterministic concerns such as animation timing. The danger is silent drift the other way: a waitFor with a generous timeout quietly converts a deterministic assertion into a probabilistic one, masking a real slowdown until it crosses the timeout and presents as flakiness rather than as the regression it is.
Architecture Diagram or Decision Matrix
The diagram below situates this layer between unit and E2E, and shows how the runner isolation model governs what each tier can safely assert.
The matrix below compares the runners and approaches you will choose between when building this layer. None is universally “best”; each occupies a defensible niche depending on the fidelity you need and the cost you can absorb.
| Approach | DOM environment | Isolation model | Best fit | Primary tradeoff |
|---|---|---|---|---|
| Vitest + Testing Library | jsdom / happy-dom |
Worker forks/threads | Fast component & integration tests | No real layout/paint; browser APIs are polyfilled |
| Jest + Testing Library | jsdom |
Process workers | Legacy suites, broad ecosystem | Slower transform; ESM friction vs. Vitest |
| Playwright Component Testing | Real Chromium/WebKit | Browser context per test | True browser behavior, visual fidelity | Heavier per-test cost; needs container resources |
| Storybook interaction tests | Real browser (test-runner) | Per-story isolation | Component contracts tied to documented states | Build step; story-driven, not arbitrary scenarios |
| React SSR/hydration harness | jsdom + server render |
Worker isolation | Catching server/client divergence | Requires rendering both paths deterministically |
A practical rule: default the broad base of component and integration work to Vitest for its speed and ES-native module handling, escalate to a real-browser runner only where layout, true browser APIs, or visual correctness genuinely matter, and reserve the SSR harness for code that crosses the hydration boundary. The detailed setup for the default path lives in Vitest configuration and setup.
The reason fidelity is the axis that matters, rather than brand preference, is that jsdom and happy-dom are logical DOM implementations: they model the document tree and most of the DOM API faithfully but do not lay out, paint, or run a real CSS engine. That is exactly the right trade for the overwhelming majority of assertions, which concern presence, text, roles, and behavior — none of which need pixels. It becomes the wrong trade the moment a test depends on computed geometry, scroll behavior, a genuine IntersectionObserver, or a visual diff. At that point you are no longer testing logic; you are testing rendering, and only a real browser context can answer truthfully. Choosing the runner is therefore really choosing how much fidelity each specific assertion requires — and paying for no more than that.
Canonical Implementation
A production-grade configuration for this layer makes three decisions explicit: the DOM environment, the isolation model, and the reset discipline that keeps tests deterministic. The Vitest config below is the baseline most teams should start from — forked workers for clean isolation, a capped pool so CI containers do not OOM, and inlined Testing Library to avoid module-resolution conflicts.
// vitest.config.ts — out-of-process isolation tuned for component & integration tests
import { defineConfig } from 'vitest/config';
import react from '@vitejs/plugin-react';
import os from 'node:os';
export default defineConfig({
plugins: [react()],
test: {
globals: true,
environment: 'jsdom',
setupFiles: ['./test/setup.ts'],
// Out-of-process isolation: each worker gets a fresh module registry.
pool: 'forks',
poolOptions: {
forks: {
singleFork: false,
// Leave one core free so the CI box stays responsive.
maxForks: Math.max(2, os.cpus().length - 1),
},
},
isolate: true,
deps: {
optimizer: { web: { include: ['@testing-library/react'] } },
},
coverage: {
provider: 'v8',
reporter: ['text', 'json-summary', 'lcov'],
thresholds: { lines: 80, branches: 75 },
},
},
});
The companion setup file is where the reset discipline lives. Mounting Testing Library’s matchers and resetting the network layer in afterEach is what prevents one test’s handlers from leaking into the next — the single most common cause of order-dependent flakiness.
// test/setup.ts — global matchers and deterministic teardown
import '@testing-library/jest-dom/vitest';
import { afterAll, afterEach, beforeAll } from 'vitest';
import { cleanup } from '@testing-library/react';
import { server } from './msw-server';
beforeAll(() => server.listen({ onUnhandledRequest: 'error' }));
afterEach(() => {
cleanup(); // unmount React trees between tests
server.resetHandlers(); // discard per-test request overrides
});
afterAll(() => server.close());
Setting onUnhandledRequest: 'error' is deliberate: any request the test forgot to mock fails loudly instead of silently hitting a real endpoint. The shared server itself is built with the MSW v2 API, the same interception layer this site recommends across the whole mocking surface.
With that substrate in place, an integration test reads as a single, legible story: render the assembled component, let the mocked network resolve, and assert on the synchronized outcome through user-facing queries. The example below exercises the server-to-client pipeline described in the taxonomy — loading state, resolved data, and role-derived UI — without mocking the store or router away.
// UserProfileDashboard.integration.test.ts
import { render, screen, waitFor } from '@testing-library/react';
import { http, HttpResponse } from 'msw';
import { describe, expect, it } from 'vitest';
import { server } from './test/msw-server';
import { UserProfileDashboard } from './UserProfileDashboard';
describe('UserProfileDashboard', () => {
it('renders role-based UI once the profile resolves', async () => {
server.use(
http.get('/api/user/profile', ({ request }) => {
// MSW v2 resolver signature: read from request, return HttpResponse.
const auth = request.headers.get('authorization');
return HttpResponse.json({
id: 'usr_123',
name: 'Jane Doe',
role: auth ? 'admin' : 'guest',
});
}),
);
render(<UserProfileDashboard />);
// Deterministic loading assertion before the network resolves.
expect(screen.getByText(/loading profile/i)).toBeInTheDocument();
await waitFor(() => {
expect(screen.getByText('Jane Doe')).toBeInTheDocument();
});
// Role-derived control proves the synchronization pipeline ran end to end.
expect(
screen.getByRole('button', { name: /admin settings/i }),
).toBeInTheDocument();
});
});
Note the per-test server.use(...) override layered on top of the shared default handlers; because afterEach calls resetHandlers(), that override is scoped to this test alone and cannot leak into the next file. That single pattern — shared defaults, per-test overrides, reset between tests — is the backbone of deterministic integration testing, and it works identically whether the runner is Vitest or Jest.
Layer Interaction Map
This layer does not stand alone. Its determinism depends on the mocking layer beneath it, and its scope is bounded by the strategy layer above it.
Below it sits Advanced Mocking & Service Isolation. Every component test that touches the network relies on a stubbed transport; every integration test that involves time, randomness, or timers relies on controlled clocks. When a component test flakes intermittently, the root cause is far more often an unfrozen clock or a leaking handler than a bug in the component — which is why the reset discipline above belongs in shared setup, not scattered per file. The mocking layer supplies the deterministic substrate this layer renders on top of.
Above it sits Test Strategy & Pyramid Design. Strategy decides how much of each test type to write and where the boundary between integration and E2E should fall. A component test that quietly grows into an integration test, then into a near-E2E run against many real collaborators, is a strategy failure that shows up here as slow, fragile tests. The strategy layer also governs how this layer’s cost is measured and capped.
Within this layer, the topics interlock. Runner configuration determines the isolation every other topic inherits. Query and assertion conventions from Testing Library practice apply equally to component tests, integration tests, and the play functions of Storybook interaction tests. Hydration testing is integration testing with a specialized failure mode. Treating these as one coordinated system — rather than five disconnected tools — is what keeps the suite coherent as it grows.
The interaction also runs sideways into accessibility. Because the recommended query strategy locates elements by role and accessible name, a component that is inaccessible is, by construction, hard to test — a missing label or an unlabelled control makes getByRole fail. This is a deliberate forcing function: it folds an accessibility check into the ordinary cost of writing a test, rather than deferring it to a separate audit that teams perpetually postpone. Automated a11y assertions belong inside the same component suites that already render the tree, catching contrast failures, absent ARIA attributes, and broken focus management before they reach staging, where they are far more expensive to find.
There is a boundary worth policing in the other direction too. It is tempting to let an integration test creep upward — adding one more real collaborator, then a real backend stub that is really a server, then a login flow — until it has quietly become an E2E test running in the wrong harness, with the wrong cost and the wrong flakiness profile. The strategy layer above sets the rule for where that line falls; this layer’s job is to hold it. A good smell test: if a single failing test can fail for half a dozen unrelated reasons, it has outgrown this layer and should either be split into focused integration tests or promoted, deliberately, into the E2E suite where its cost is acknowledged rather than hidden.
CI/CD Integration
The economics of this layer are decided in the pipeline. Run suites sequentially and even a fast runner becomes the bottleneck on every pull request; shard intelligently and the same suite finishes in a fraction of the wall-clock time across parallel workers.
The dominant technique is file-level sharding: split the test files across N runners rather than splitting individual tests, which avoids per-test coordination overhead. Group by execution profile so fast component files and slow integration files do not starve each other and create straggler workers.
# .github/workflows/test.yml — sharded execution across parallel runners
name: Test
on:
pull_request:
branches: [main]
jobs:
vitest:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- name: Run shard ${{ matrix.shard }}
run: >
npx vitest run
--shard=${{ matrix.shard }}/4
--reporter=junit --outputFile=results-${{ matrix.shard }}.xml
- uses: actions/upload-artifact@v4
if: always()
with:
name: results-${{ matrix.shard }}
path: results-*.xml
Two refinements turn sharding from a blunt instrument into a cost lever. First, impact-based selection: use the dependency graph to run only the suites affected by a change set, which on large monorepos cuts pipeline time dramatically without reducing coverage on the code that actually moved — the mechanics are covered in balancing speed and coverage in monorepo testing. Second, resource capping: real-browser runners need explicit CPU and memory limits per worker so a parallel matrix does not trigger OOM kills, and lightweight browser contexts should be preferred over full browser instances wherever the assertion does not require them.
Reporting closes the loop. Emitting JUnit XML per shard lets the CI platform aggregate results, surface slow files, and — critically — flag the same test failing intermittently across runs, which is the signal that distinguishes a real regression from flakiness. A test that passes on retry but fails on first attempt is not “fixed” by the retry; it is a latent defect in the test’s determinism that the reset discipline and clock control discussed above are meant to eliminate at the root. Retries belong in the pipeline as a diagnostic that counts flakes, not as a blanket mechanism that hides them.
A useful gating heuristic: keep the fast component and integration shards in the required, blocking path on every pull request, and push anything that needs a heavyweight real-browser matrix into a separate job that can run in parallel without holding up merge feedback. The faster the blocking signal, the more often engineers run it locally before pushing, which is where the cheapest defects are caught in the first place.
Common Pitfalls & Anti-Patterns
- Over-mocking until tests validate fixtures, not behavior. Mocking the router, the store, and the network all at once leaves a test that only proves the mocks were configured. Mock the network edge; keep state and routing real for integration tests.
- Coupling assertions to implementation details. Querying by
data-testidor reaching into component internals makes tests break on harmless refactors. Prefer accessible, user-facing queries so a passing test also proves the UI is usable. - Ignoring the hydration boundary. Components that mount cleanly in a pure client render can still throw mismatches when adopted from server markup. Untested hydration ships layout shifts and console errors to production.
- Snapshot sprawl. Snapshots are cheap to add and expensive to own. Applied to dynamic or interactive UI they become noise that reviewers rubber-stamp. Limit them to small, static fragments behind a strict review gate.
- Sequential integration suites. Running heavy suites without sharding scales linearly with file count and wastes CI minutes. Shard by file and group by execution profile from the start.
- Non-deterministic seed data and clocks. Random IDs,
Date.now(), and live timers destroy parallelization guarantees. Freeze them via the mocking layer so a failure always maps to a real change, never a race. - Treating
waitForas a substitute for awaiting the right thing. Wrapping an assertion inwaitForwith a long timeout papers over a missingawaitor an unresolved promise. The test eventually passes, but it has become timing-dependent and slow. Await the specific signal — a resolved query, a settled mutation — instead of polling blindly. - Mocking the framework instead of the network. Replacing the router, the query client, or the state provider with a stub turns an integration test back into a unit test wearing the wrong label. Keep framework internals real and intercept only at the transport boundary, so the test exercises the wiring that production actually runs.
- One enormous setup shared across unrelated tests. A monolithic fixture that every test imports couples them invisibly: a change for one test silently alters the inputs of fifty others. Prefer small, explicit per-test arrangement (the
server.useoverride pattern) over a god-fixture that no one dares touch.
Each of these has the same shape: a shortcut that trades a property of the test — determinism, honesty about cost, independence — for momentary convenience, and then collects interest as the suite grows. The antidote is consistent: draw the mock boundary at the network edge, assert through user-facing queries, freeze every source of non-determinism in shared setup, and keep each test independent enough that its failure has exactly one explanation.
Topics in This Section
- Playwright component testing — Mount components in a real browser context for true layout, paint, and browser-API fidelity when
jsdomis not enough. - React state hydration testing — Verify that server-rendered markup is adopted by the client without mismatches, lost state, or console warnings.
- Testing Library best practices — Standardize accessible, user-centric query and assertion patterns so tests survive refactors instead of breaking on them.
- Vitest configuration and setup — Tune the ES-native runner — environments, worker isolation, and coverage — as the fast default for this layer.
- Storybook interaction tests — Drive components through documented states with play functions so component contracts and living documentation stay in sync.
Related
- Up to Test Strategy & Pyramid Design — decide how much of each test layer to write and where integration ends.
- Across to Advanced Mocking & Service Isolation — the deterministic substrate every test in this layer renders on.
- Deeper into configuring Vitest for the Next.js App Router — a concrete runner setup for server-component-heavy apps.
- Reference HTTP request stubbing techniques — how the network edge gets mocked beneath component and integration tests.
Playwright Component Testing
Playwright Component Testing mounts isolated UI units in a real browser via a Vite sandbox. Configure mounting, network routing, and CI for deterministic runs.
Vitest Configuration & Setup
Architect a deterministic Vitest configuration for component and integration testing: environment isolation, aliases, pools, coverage thresholds, and CI caching.
Testing Library Best Practices
Query by accessible role first, drive interactions with userEvent, and resolve async state without act warnings. A practical Testing Library playbook for Vitest.
React State Hydration Testing
Validate that server-rendered React markup matches client state with deterministic hydration tests in Vitest, asserting zero mismatch warnings before deploy.
Storybook Interaction Tests
Build deterministic Storybook 8 interaction tests with @storybook/test play functions and run them headlessly in CI with the test-runner on Playwright.