AI-Generated Testing: Why Most Approaches Fail
AI-generated tests frequently fail in production because they lack systematic quality standards. This document explains the problem and presents a solution combining utility standards, TEA workflows, and automation interfaces across both UI, API and contract testing.
The Problem with AI-Generated Tests
Section titled âThe Problem with AI-Generated TestsâWhen teams use AI to generate tests without structure, they often produce what can be called âslop factoryâ outputs:
| Issue | Description |
|---|---|
| Redundant coverage | Multiple tests covering the same functionality |
| Incorrect assertions | Tests that pass but donât actually verify behavior |
| Flaky tests | Non-deterministic tests that randomly pass or fail |
| Unreviewable diffs | Generated code too verbose or inconsistent to review |
The core problem is that prompt-driven testing paths lean into nondeterminism, which is the exact opposite of what testing exists to protect.
The Solution: A Three-Part Stack
Section titled âThe Solution: A Three-Part StackâThe solution combines three components that work together to enforce quality:
1. Utilities: Playwright-Utils + Pact.js Utils
Section titled â1. Utilities: Playwright-Utils + Pact.js Utilsâ@seontechnologies/playwright-utils standardizes commonly reinvented testing primitives across UI, API, web, and non-web flows. @seontechnologies/pactjs-utils standardizes Pact.js contract-testing primitives for provider state setup, request filtering, and provider/message verifier configuration.
| Track | Utility Layer | Purpose |
|---|---|---|
| UI/API/Web/Non-web | @seontechnologies/playwright-utils | Reusable testing primitives and fixtures |
| Contract | @seontechnologies/pactjs-utils | Reusable Pact consumer/provider helpers and verification |
Playwright-Utils examples: api-request, auth-session, intercept-network-call, recurse, log, network-recorder, burn-in, network-error-monitor, file-utils.
pactjs-utils examples: createProviderState, toJsonMap, setJsonBody, setJsonContent, createRequestFilter, noOpRequestFilter, buildVerifierOptions, buildMessageVerifierOptions.
Together, these utility libraries eliminate the need to reinvent core testing primitives across UI, API, web, non-web, and contract testing.
2. Process: TEA (Test Architect Agent)
Section titled â2. Process: TEA (Test Architect Agent)âA quality operating model packaged as eight executable workflows spanning test design, CI/CD gates, and release readiness. TEA encodes test architecture expertise into repeatable processes.
| Workflow | Purpose |
|---|---|
test-design | Risk-based test planning per epic |
framework | Scaffold production-ready test infrastructure |
ci | CI pipeline with selective testing |
atdd | Acceptance test-driven development |
automate | Prioritized test automation |
test-review | Test quality audits (0-100 score) |
nfr-assess | Non-functional requirements assessment |
trace | Coverage traceability and gate decisions |
3. Automation Interfaces: Playwright CLI + MCPs
Section titled â3. Automation Interfaces: Playwright CLI + MCPsâAutomation interfaces enable real-time verification during test generation and review across browser and contract tracks:
- Playwright CLI: token-efficient browser automation for stateless execution and fast checks in workflows.
- Playwright MCP: stateful browser automation with richer context for interactive exploration and DOM validation.
- Pact MCP: broker-aware contract automation for verification matrix queries, provider-state discovery, compatibility analysis, and
can-i-deploydeployment decisions.
Instead of inferring behavior from documentation alone, these interfaces allow agents to:
- Run browser flows and confirm the DOM against the accessibility tree
- Validate UI/API network behavior in real-time
- Query Pact verification matrix results across consumer/provider versions
- Check provider states and contract compatibility before release
- Execute
can-i-deploychecks against target environments
How They Work Together
Section titled âHow They Work TogetherâThe three components form a quality pipeline:
| Stage | Component | Action |
|---|---|---|
| Standards | Playwright-Utils + pactjs-utils | Provides production-ready patterns for UI and contract tests |
| Process | TEA Workflows | Enforces systematic test planning and review |
| Verification | Playwright CLI + Playwright MCP + Pact MCP | Validates tests and contracts against live systems |
Before (AI-only): 20 tests with redundant coverage, incorrect assertions, and flaky behavior.
After (Full Stack): Risk-based selection, verified selectors, validated behavior, contract compatibility checks, reviewable code.
Why This Matters
Section titled âWhy This MattersâTraditional AI testing approaches fail because they:
- Lack quality standards â No consistent patterns or utilities
- Skip planning â Jump straight to test generation without risk assessment
- Canât verify â Generate tests without validating against actual behavior
- Donât review â No systematic audit of generated test quality
The three-part stack addresses each gap:
| Gap | Solution |
|---|---|
| No standards | Playwright-Utils + pactjs-utils provide production-ready patterns |
| No planning | TEA test-design creates risk-based test plans |
| No verification | Playwright CLI + Playwright MCP + Pact MCP validate against live systems |
| No review | TEA test-review audits quality with scoring |
This approach is sometimes called context engineeringâloading domain-specific standards into AI context automatically rather than relying on prompts alone. TEAâs tea-index.csv manifest loads relevant knowledge fragments so the AI doesnât relearn testing patterns each session.