Automated Test Generation

Overview

Writing tests is time-consuming, mechanical, and often deferred under deadline pressure — which is precisely when it matters most. AI-assisted test generation addresses the mechanical part: scaffolding test files, generating cases for a given function signature, covering the happy path and common error conditions. This reduces the friction of starting tests enough that teams actually write them.

What AI cannot do is decide what to test. A generated test suite that covers the obvious paths and nothing else is not a reliable safety net. The engineer is still responsible for identifying the edge cases that matter, verifying that generated tests would actually fail on a broken implementation, and ensuring that tests are testing behavior, not just recording what the current implementation happens to do.

AI-generated tests follow the same standards as hand-written tests. For test quality standards more broadly, see Code Review Best Practices.

Why It Matters

Starting a test file is the highest-friction moment. Once a test file exists with a describe block and one passing test, adding more tests is straightforward. AI eliminates the blank-page problem: it generates the scaffold, the imports, and the first few cases. Engineers then add the cases that require domain knowledge.

Coverage gaps accumulate where tests are hardest to write. The functions that are most complex and most important tend to have the least test coverage — not because engineers don't intend to test them, but because writing comprehensive tests for them takes time that is always in short supply. AI accelerates the first pass on exactly these functions.

Refactoring is safer with better coverage. The value of a test suite is not just catching bugs on the day of the refactor — it is accumulated over months and years as the codebase changes. AI-assisted test generation, done well, produces a safety net that pays dividends long after the session that created it.

Generated tests are cheaper to write than bugs are to fix. Even an imperfect AI-generated test suite that catches 60% of regressions prevents incidents. The cost of generating and reviewing tests is lower than the cost of debugging production issues they could have caught.

Standards & Best Practices

Verify that generated tests would fail on a broken implementation

This is the most important standard for AI-generated tests. Before committing any generated test, introduce a deliberate bug in the implementation — rename a variable, invert a condition, return the wrong value. The tests must fail. If they pass with a broken implementation, they are not testing anything.

A test that records what the current implementation does, rather than what it should do, provides no safety net. It will pass after a regression just as it passes today.

Test behaviour, not implementation

AI tools tend to write tests that are tightly coupled to the current implementation: they assert on internal state, on specific function calls, on the structure of returned objects. These tests break on any refactor even when the behaviour is correct — which is the opposite of what a test suite should do.

Tests should assert on observable outputs and side effects, not on internal mechanics.

// Weak: tests implementation detail
expect(processOrder).toHaveBeenCalledWith(orderId, { status: 'pending' });

// Strong: tests observable outcome
const result = await createOrder(payload);
expect(result.status).toBe('confirmed');
expect(result.confirmationId).toMatch(/^ORD-\d{8}$/);

Review generated tests specifically for this pattern. Rewrite any assertion that is coupled to internal mechanics.

One prompt per function, with explicit edge cases

Do not prompt "write tests for this file." Prompt "write tests for this function, covering these cases." AI assistants produce better tests when the scope is narrow and the cases are enumerated.

Always add the edge cases that require domain knowledge after reviewing the generated output. AI generates the obvious cases; engineers add the cases that reflect real usage patterns and known failure modes.

Generated tests go through the same review as generated code

Apply the AI-generated code review checklist from AI Code Assistant Adoption. A generated test file is code — it has the same bugs, security considerations, and correctness requirements as any other file.

Prefer generating tests before modifying existing code

AI is most useful for test generation when the implementation is stable. Using it to generate tests before a refactor gives you a before-state safety net. Using it after a refactor risks generating tests that match the new implementation, not the intended behaviour.

How to Implement

Step 1 — Generate a test scaffold

Use your AI tool to generate the initial structure. Provide the function signature, types, and a brief description of what it does.

Claude Code example:

claude "Write Jest unit tests for this function.
Cover: empty array input, single item, duplicate items, and items with missing fields.
Use describe/it blocks. Mock nothing unless the function has external I/O.
Return only the test file content.

$(cat src/lib/deduplicateUsers.ts)"

Cursor: Open the implementation file, then open a new test file and use Cmd+K: "Write unit tests for deduplicateUsers covering empty array, single item, and duplicate detection. Use Jest with describe/it."

OpenAI Codex (ChatGPT): Paste the function and prompt: "Write Jest unit tests for this function. Cover: empty array, single item, duplicates by id, and entries where id is undefined. Show only the test file. Use describe/it blocks."

Step 2 — Review and verify

Before adding the generated tests to the codebase:

Read every test case. Does the assertion match what the function should do?
Introduce a deliberate bug in the implementation. Do the tests fail?
Remove the bug. Do the tests pass again?
Identify edge cases the AI missed. Add them manually.

The review step is not optional — it is the point where generated tests become trustworthy tests.

Step 3 — Add domain-specific edge cases

AI generates the obvious cases. You add the ones that require knowing your system:

Cases driven by business rules ("discount cannot exceed 80% of order total")
Cases based on known past bugs ("this function previously returned undefined when the user had no address")
Cases based on data from real usage ("order payloads can have up to 500 line items")
Boundary conditions that are specific to your domain ("free tier users have a 10-item limit")

Write these by hand. They encode knowledge the AI does not have.

Step 4 — Integrate coverage reporting in CI

Generated tests should measurably improve coverage. Track coverage in CI to see whether the test suite is actually growing:

# package.json
{
  'scripts': { 'test': 'jest', 'test:coverage': 'jest --coverage' },
  'jest': { 'coverageThreshold': { 'global': { 'branches': 70, 'functions': 80, 'lines': 80 } } },
}

# CI step
- name: Test with coverage
  run: pnpm test:coverage

Set thresholds that reflect current coverage and raise them deliberately as test generation improves the suite. Don't set thresholds so high they block progress; don't set them so low they're meaningless.

Step 5 — Use mutation testing to validate suite quality

Coverage percentage measures which lines were executed, not whether the tests would catch a bug. Mutation testing tools (Stryker for JavaScript/TypeScript) introduce deliberate bugs and check whether tests catch them.

pnpm add -D @stryker-mutator/core @stryker-mutator/jest-runner

# stryker.config.json
{
  "testRunner": "jest",
  "mutate": ["src/**/*.ts", "!src/**/*.test.ts"],
  "thresholds": { "high": 80, "low": 60, "break": 50 }
}

A test suite with 90% line coverage but 40% mutation score has many tests that don't catch bugs. Run mutation testing on critical modules after AI-assisted test generation to verify quality.

Tools & Templates

Prompt template: unit tests

Write [test framework] unit tests for the `[function name]` function below.

Cover these cases:
1. [Case 1 — describe the input and expected output]
2. [Case 2]
3. [Case 3 — error/edge case]
4. [Case 4 — boundary condition]

Rules:
- Use describe/it blocks
- Assert on return values and observable side effects — not on internal calls
- Mock external I/O (HTTP calls, database queries) if present; mock nothing else
- Each test should fail if the implementation has a bug in the case it covers

Return only the test file. Do not include the implementation.

[paste function here]

Prompt template: integration tests

Write integration tests for the [endpoint / workflow] below.

Context:
- Framework: [Express / Next.js / etc.]
- Database: [Postgres via Prisma / etc.]
- Auth: [how auth is handled]

Cover:
- Happy path with valid input
- Missing required field (expect 400)
- Unauthorised request (expect 401)
- Resource not found (expect 404)
- [Any domain-specific edge cases]

Use [Supertest / msw / etc.] for HTTP. Use a real test database, not mocks.

AI test generation by tool

Tool	How to use for test generation	Codebase awareness
Claude Code	`claude "Write tests for..." < file.ts` or via VS Code extension	Reads file on demand; understands imports
Cursor	Cmd+K in test file: describe what to generate	Full project index — understands related types
OpenAI Codex	Paste function + prompt in ChatGPT	Only what you paste

JavaScript test generation checklist

## AI-Generated Test Review Checklist

- [ ] I have read every test case and understand what it asserts
- [ ] I introduced a deliberate bug — tests failed as expected
- [ ] I removed the bug — tests pass again
- [ ] Assertions are on outputs/behaviour, not on internal implementation calls
- [ ] Edge cases from domain knowledge are added manually
- [ ] The test file follows our naming conventions (`*.test.ts` in the same directory)
- [ ] Coverage for this module improved (run `pnpm test:coverage`)

Common Pitfalls

Trusting green tests as proof of correctness. A test suite generated by the same system that wrote the implementation is likely to encode the same assumptions and miss the same bugs. Always verify tests by introducing a deliberate failure.

Tautological tests. AI often generates tests that assert expect(fn(x)).toBe(fn(x)) in spirit — recording what the implementation does rather than what it should do. Review every assertion for this pattern. If a test would still pass after any implementation change, it is not testing anything.

Prompting at file scope, not function scope. "Write tests for this file" produces a wide, shallow test suite. "Write tests for calculateRefund, covering these four cases" produces focused, useful tests. Narrow the scope.

Skipping domain-specific edge cases. AI generates the obvious path and the obvious error case. The edge cases that come from business rules, past bugs, and real usage patterns require human knowledge. If the AI-generated suite is the complete test suite, it is not complete.

Treating generated tests as documentation. Tests generated from an implementation describe what the implementation does, not what it should do. If the implementation is wrong to begin with, the tests document the wrong behaviour. Write intent-driven tests first where possible.

Coverage percentage as the only quality signal. 90% coverage with poor-quality tests is worse than 60% coverage with high-quality tests — because the high-coverage suite gives false confidence. Use mutation testing to validate the suite's ability to catch bugs, not just its line execution rate.

Automated Test Generation

On this page