Agents

Using AI agents to QA your own build

Three sessions with no memory of each other: one wrote test cases from the spec, one built the app, one ran the tests. The seven failures were real gaps. That's the trick.

The Cadenly TeamUpdated June 30, 2026

A PM shared a clever setup for QAing a side project. Three separate AI sessions, none aware of the others. The first read the PRD and generated 42 test cases. The second built the app from the same PRD, a different week. The third picked up the 42 cases, ran each in a real browser, and posted results back. 35 passed, 7 were blocked — and the 7 were real gaps between what they'd spec'd and what they'd built. Things like a cross-midnight offer that should split into two rows, where the helper existed but was never wired up.

Their own line is the insight: these were the kind of gaps they'd never catch testing their own code, because they'd interpret the ambiguous spec the same way while testing as they did while building. That sentence is the whole reason this works.

Why separation catches what self-review can't

When you write the spec, build from it, and test it, you carry one interpretation through all three. Every ambiguity resolves the same way in your head each time, so the test never disagrees with the build — they're the same guess wearing two hats. The bug isn't in the code; it's in the shared interpretation, and a single mind can't see it.

Separating the roles breaks the shared interpretation. The session that writes test cases from the spec interprets the ambiguity its own way. When its expectation collides with what the build session produced, the collision is the ambiguity in the spec, made visible. You're not finding code bugs. You're finding the places your spec was unclear enough that two readers diverged.

The pattern, generalized

  • Generate test cases from the spec, independently of the build. The cases should come from the requirements, not from the code, or they just re-encode the build's assumptions.
  • Run them against the real product, not a description of it. A test that "reads the code and reasons about it" inherits the code's interpretation. A test that clicks the actual button doesn't.
  • Treat every failure as a spec question first. A blocked test usually means the spec was ambiguous, not just that the code is wrong. Resolve the ambiguity, then fix the build.

This is the same principle as having a fresh reviewer, automated and made systematic. The reviewer's value was never extra eyeballs — it was a different interpretation that surfaces your blind spots. Separated agent sessions manufacture that on demand.

Key takeaways
  • You interpret an ambiguous spec the same way building and testing — so self-QA misses the gap.
  • Separating spec, build, and test breaks the shared interpretation.
  • Generate test cases from the requirements, not the code, and run them against the real product.
  • A blocked test is usually a spec ambiguity, not just a code bug.

Generate test cases from your spec in Cadenly

Cadenly derives test cases directly from your requirements — independent of the build — so the gaps between intent and implementation surface before users find them.

Start free →