Skip to content
🤖 Consolidated, AI-optimized BMAD docs: llms-full.txt. Fetch this plain text file for complete context.

AI-Generated Testing: Why Most Approaches Fail

AI-generated tests frequently fail in production because they lack systematic quality standards. This document explains the problem and presents a solution combining utility standards, TEA workflows, and automation interfaces across both UI, API and contract testing.

When teams use AI to generate tests without structure, they often produce what can be called “slop factory” outputs:

IssueDescription
Redundant coverageMultiple tests covering the same functionality
Incorrect assertionsTests that pass but don’t actually verify behavior
Flaky testsNon-deterministic tests that randomly pass or fail
Unreviewable diffsGenerated code too verbose or inconsistent to review

The core problem is that prompt-driven testing paths lean into nondeterminism, which is the exact opposite of what testing exists to protect.

The solution combines three components that work together to enforce quality:

@seontechnologies/playwright-utils standardizes commonly reinvented testing primitives across UI, API, web, and non-web flows. @seontechnologies/pactjs-utils standardizes Pact.js contract-testing primitives for provider state setup, request filtering, and provider/message verifier configuration.

TrackUtility LayerPurpose
UI/API/Web/Non-web@seontechnologies/playwright-utilsReusable testing primitives and fixtures
Contract@seontechnologies/pactjs-utilsReusable Pact consumer/provider helpers and verification

Playwright-Utils examples: api-request, auth-session, intercept-network-call, recurse, log, network-recorder, burn-in, network-error-monitor, file-utils.

pactjs-utils examples: createProviderState, toJsonMap, setJsonBody, setJsonContent, createRequestFilter, noOpRequestFilter, buildVerifierOptions, buildMessageVerifierOptions.

Together, these utility libraries eliminate the need to reinvent core testing primitives across UI, API, web, non-web, and contract testing.

A quality operating model packaged as eight executable workflows spanning test design, CI/CD gates, and release readiness. TEA encodes test architecture expertise into repeatable processes.

WorkflowPurpose
test-designRisk-based test planning per epic
frameworkScaffold production-ready test infrastructure
ciCI pipeline with selective testing
atddAcceptance test-driven development
automatePrioritized test automation
test-reviewTest quality audits (0-100 score)
nfr-assessNon-functional requirements assessment
traceCoverage traceability and gate decisions

Automation interfaces enable real-time verification during test generation and review across browser and contract tracks:

  • Playwright CLI: token-efficient browser automation for stateless execution and fast checks in workflows.
  • Playwright MCP: stateful browser automation with richer context for interactive exploration and DOM validation.
  • Pact MCP: broker-aware contract automation for verification matrix queries, provider-state discovery, compatibility analysis, and can-i-deploy deployment decisions.

Instead of inferring behavior from documentation alone, these interfaces allow agents to:

  • Run browser flows and confirm the DOM against the accessibility tree
  • Validate UI/API network behavior in real-time
  • Query Pact verification matrix results across consumer/provider versions
  • Check provider states and contract compatibility before release
  • Execute can-i-deploy checks against target environments

The three components form a quality pipeline:

StageComponentAction
StandardsPlaywright-Utils + pactjs-utilsProvides production-ready patterns for UI and contract tests
ProcessTEA WorkflowsEnforces systematic test planning and review
VerificationPlaywright CLI + Playwright MCP + Pact MCPValidates tests and contracts against live systems

Before (AI-only): 20 tests with redundant coverage, incorrect assertions, and flaky behavior.

After (Full Stack): Risk-based selection, verified selectors, validated behavior, contract compatibility checks, reviewable code.

Traditional AI testing approaches fail because they:

  • Lack quality standards — No consistent patterns or utilities
  • Skip planning — Jump straight to test generation without risk assessment
  • Can’t verify — Generate tests without validating against actual behavior
  • Don’t review — No systematic audit of generated test quality

The three-part stack addresses each gap:

GapSolution
No standardsPlaywright-Utils + pactjs-utils provide production-ready patterns
No planningTEA test-design creates risk-based test plans
No verificationPlaywright CLI + Playwright MCP + Pact MCP validate against live systems
No reviewTEA test-review audits quality with scoring

This approach is sometimes called context engineering—loading domain-specific standards into AI context automatically rather than relying on prompts alone. TEA’s tea-index.csv manifest loads relevant knowledge fragments so the AI doesn’t relearn testing patterns each session.