Why 100% Coverage Is the Wrong Target

Mandating 100% code coverage feels rigorous and is one of the most expensive mistakes a testing strategy can make. This guide is for tech leads and engineering managers deciding where to set a coverage gate, and for developers being asked to chase a number that no longer improves quality. The argument applies to any JavaScript runner but uses Vitest with the v8 provider for examples, alongside Stryker for mutation testing. The core claim is simple: coverage measures which lines executed, never whether your tests verified anything — so beyond a sensible floor, pushing the number higher buys diminishing returns and actively rewards bad tests. For the mechanics of setting that floor sensibly, see defining coverage thresholds.

Root Cause Analysis

The flaw is definitional. Line and statement coverage record execution: an instrumentation provider marks a line covered the instant control flow reaches it, regardless of whether any assertion observed the result. A test can import a module, call every function, and assert nothing — and still report 100% line coverage. This is the executed-versus-verified gap, and it is why a coverage percentage is a lower bound on what is tested, never a measure of what is correct.

Three forces make 100% specifically harmful rather than merely unhelpful. First, diminishing returns: the first 70–80% of coverage typically tracks the code paths that carry real risk, while the last 10–20% is error handling for impossible states, defensive branches, and glue code where a test costs disproportionately more to write than the bug it could catch is likely to occur. Second, Goodhart’s law: once 100% is the target, it stops being a useful measure, because engineers reach it by writing assertion-free tests, adding /* istanbul ignore */ comments, or testing trivial getters instead of hard logic. Third, false confidence: a team that sees 100% believes it is safe and stops investing in the assertion quality, mutation resistance, and integration coverage that actually prevent regressions.

The symptom is a suite that is green, large, slow, and yet still ships bugs — because the number certified execution, not behaviour.

Reproducible Setup

The gap is easy to demonstrate. Here is a function and a test that achieves 100% coverage while verifying nothing.

// src/discount.ts
export function applyDiscount(price: number, pct: number): number {
  if (pct < 0 || pct > 100) {
    throw new RangeError('pct out of range');
  }
  return price - (price * pct) / 100;
}
// src/discount.test.ts
import { expect, test } from 'vitest';
import { applyDiscount } from './discount';

test('runs the function', () => {
  applyDiscount(100, 10);          // executes the happy path
  expect(() => applyDiscount(100, -5)).toThrow(); // executes the guard
  // No assertion on the returned value — yet coverage is 100%.
});
npx vitest run --coverage
# discount.ts | 100% | 100% | 100% | 100%

Every line, branch, and function is green. Yet applyDiscount(100, 10) could return 0, NaN, or 90 — the suite would never notice, because no assertion inspected the result. The number is a perfect score for a test that proves almost nothing.

Implementation

The fix is to stop optimizing the percentage and start measuring whether tests detect defects. Mutation testing does exactly that: it injects small faults (mutants) into the source and checks whether any test fails. A mutant that survives is a line your tests execute but do not verify — the executed-versus-verified gap made visible.

Step 1: Add a risk-appropriate coverage floor, not a ceiling

Set the line floor where risk lives — commonly 80% global with higher tiers on critical code — and never above the measured baseline. This is the floor, not the goal.

// vitest.config.ts
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json-summary'],
      thresholds: { lines: 80, branches: 75, functions: 80, statements: 80 },
    },
  },
});

Step 2: Introduce mutation testing on critical modules

Run Stryker against the highest-risk paths first; it is far slower than coverage, so scope it deliberately rather than running it repo-wide.

npm install -D @stryker-mutator/core @stryker-mutator/vitest-runner
// stryker.config.json
{
  "$schema": "./node_modules/@stryker-mutator/core/schema/stryker-schema.json",
  "testRunner": "vitest",
  "coverageAnalysis": "perTest",
  "mutate": ["src/domain/**/*.ts"],
  "thresholds": { "high": 80, "low": 60, "break": 55 }
}
npx stryker run
# Ran 142 mutants. 31 survived. Mutation score: 78.2%

Step 3: Gate on mutation score for critical code

The break threshold fails the build when the mutation score drops too low — a far stronger signal than line coverage because surviving mutants are concrete proof of unverified logic. Apply it only to critical modules so the cost stays bounded.

Step 4: Measure the things coverage ignores

Track assertion density (assertions per test), branch coverage rather than line coverage on conditional-heavy code, and integration-layer coverage that exercises real seams. Branch coverage is harder to game than line coverage, and the layer-mapping guidance in the Cost-Benefit Analysis of Test Layers helps decide which seams deserve the investment that chasing the last coverage points would have wasted.

Verification

You can prove the assertion-free test is weak by mutating the discount function. Stryker will flip - to + and 100 to other values; with the no-assertion test, those mutants survive:

npx stryker run
# src/discount.ts
#   Survived: Arithmetic operator (-) -> (+)   line 5
#   Survived: Block statement removal          line 5
# Mutation score: 0% (despite 100% line coverage)

Now add a real assertion (expect(applyDiscount(100, 10)).toBe(90)), rerun, and watch the mutants die and the score climb. The contrast is the proof: line coverage stayed at 100% throughout, while the mutation score moved from 0% to high — only the second metric reflected the change in test quality. That divergence is the entire argument for treating coverage as a floor and mutation score as the real signal, exactly as the parent coverage threshold methodology recommends.

Troubleshooting

Mutation testing is too slow to run in CI. Stryker re-runs the suite per mutant, so a full repo run can take hours. Fix: scope mutate to critical directories, enable coverageAnalysis: "perTest" so only relevant tests run per mutant, and schedule the full run nightly rather than per PR.

The team reverts to demanding 100% anyway. This usually signals that “coverage” is being used as a proxy for trust the team lacks elsewhere. Fix: replace the single number with a small dashboard — floor coverage, mutation score on critical paths, and flaky-test rate — so confidence rests on signals that cannot be gamed by assertion-free tests.

Mutation score is low but coverage is high on flaky code. Surviving mutants on non-deterministic tests are noise, not signal, because the test’s pass/fail does not depend on the code under test. Fix: stabilize those tests first via flaky test mitigation, then re-measure.

FAQ

Is any coverage target useful at all?

Yes — a floor is genuinely useful. A threshold around 80% on most code, with higher tiers on critical paths, reliably catches the regression where someone ships an entirely untested module. The argument here is not against measuring coverage; it is against treating 100% as a goal, because the gap between a sensible floor and a perfect score is filled almost entirely by low-value tests and gamed metrics rather than real defect detection.

What is the difference between coverage and mutation testing?

Coverage measures whether a line executed during the test run; mutation testing measures whether your tests detect a defect in that line by injecting faults and checking that a test fails. A line can be 100% covered yet have a 0% mutation score, which means it ran but nothing verified its behaviour. Mutation score is therefore the stronger quality signal, at the cost of being far slower to compute.

Won’t lowering the target let quality slip?

No, because you replace the single inflated number with stronger signals — mutation score on critical modules, branch coverage on conditional logic, and assertion density — that are much harder to satisfy with empty tests. Quality is more protected, not less, because the team optimizes metrics that correlate with real defect detection instead of one that correlates only with execution.

How do I convince a team attached to 100%?

Run the demonstration from this guide on your own codebase: find a file at 100% coverage, run Stryker against it, and show the surviving mutants in a real review. Seeing concrete unverified logic inside a “fully covered” file is more persuasive than any abstract argument, and it reframes the conversation from chasing a percentage to closing the executed-versus-verified gap.