Jun 1, 2026

Developers Ship AI-generated Code Faster. How Does QA Adapt?

AUTHOR

Pavel Stambrecht

The productivity gains from AI-assisted development are real. So are the consequences. The pressure is moving from how fast code gets written to whether it holds up in production.

‍

Data presented at TestCrunch 2026 tells an uncomfortable story: AI-generated code contains roughly 70% more defects than manually written code, with critical defects running 40% higher and security vulnerabilities 90% higher. Veracode tested over 100 language models and found 45% of AI-generated samples failed security tests outright.

‍

Yet pressure to ship faster keeps growing. The result is a paradox that QA teams feel on a daily basis.

‍

Whether you are working with vibe coding for rapid prototyping or a structured AI-assisted development workflow, the underlying challenge is the same: AI generates code at a speed that quality processes were never designed to match.

Building rules into AI, not just review

AI can generate enormous volumes of code at remarkable speed. Without clear rules, it drifts from quality standards, introduces structural inconsistencies and misses security requirements.

‍

One of the most underused capabilities of AI coding agents is the ability to define how they should work. That is the foundation of our Flow Guidelines for AI: a set of skills and rules that guides agents such as Claude Code, Cursor or Junie toward consistent, standards-compliant output.

‍

The guidelines are not generic. They are adapted to the specific project, its stack, its standards, and its constraints. And they cover both development and automated testing.

‍

These are standards we have refined over years on real projects. The difference is that instead of applying them only in the review phase, we build them into the generation process itself.

‍

The result is code that meets our quality criteria and that QA can validate efficiently.

‍

Mobile development adds another layer of complexity here. The tooling, the testing approaches, the performance constraints and the release cycles all differ from web or backend development.

‍

Guidelines that work in one context do not automatically transfer to another. That is why we treat them as a living document, not a fixed checklist.

What this looks like in the QA practice

On the QA side, the same logic applies. AI handles the parts of the work that are repetitive or time-consuming but do not require human judgement.

‍

Our testers use it to generate test scenarios, extract acceptance criteria from long user stories, and surface the parts of a ticket that matter most before diving into the detail.

‍

AI is also useful for learning, getting to grips with a new tool or framework without having to wade through documentation first. Our QA team uses it this way when picking up automated testing in Maestro.

‍

In each case, the output is reviewed. AI speeds up the groundwork. The decisions stay with the people who understand the context.

Where the risk accumulates

When code generation accelerates and QA stays where it was, the gap between them is where problems accumulate. Not immediately. Six months later, when the code that shipped last quarter starts breaking in production, affecting App Store ratings, or surfacing as a security incident.

‍

Responsibility for the outcome is distributed. Developers own every line of code they commit, regardless of what generated it. QA owns what reaches production. The team owns the rules set up in advance. That weight stays with the people working with AI.

‍

If you want to see how this plays out in a regulated environment, where architecture and security have to hold up at scale, we covered it recently in a session with Wultra.