Reviewer Economy: The Hidden Tax of AI-Generated Code

Why are 92% of engineering leaders deploying AI in delivery pipelines, yet only 8% reporting a significant improvement in software quality?

Engineering 2028: Leading Human + AI Teams Responsibly, produced in partnership with Damilah, explores the gap between AI adoption and AI outcomes, and what it reveals about where the real work of software delivery is actually happening.

AI productivity has a story that’s very easy to sell. Developers generate code faster, features ship sooner, and the backlog shrinks, and for some teams, in some contexts, that’s genuinely what’s happening. However, for many others, the time saved in generation is being quietly consumed somewhere else entirely

Engineering 2028 data puts a number on the disconnect. 92% of respondents are already deploying AI in their delivery pipelines, yet only 8% report a significant improvement in software quality. That gap deserves more attention than it gets, because it points to something structural rather than something that better tooling will just fix over time.

The Review Tax

When AI generates code, someone has to check it. Not just for syntax or easy errors, but for whether it actually does what it’s supposed to do, whether it fits the architecture it’s being dropped into, and whether it’s introducing problems that would otherwise fly under the radar till later.

That manual review takes time, and for many teams it’s accumulating into something that could be called a review tax.

“The mundane work of writing the code itself goes away, but at the same time, you spend more time in this later stage where you keep reviewing and thinking ‘Does it really do what I need it to do?'”

– Filip Berlikowski, CTO, Payall

Unfortunately, this means the bottleneck hasn’t been removed, but it has moved. Generation is faster, but curation is slower and more time consuming. The mental load of verifying AI output at scale is something most teams are still figuring out how to handle. For organisations that went into AI adoption expecting a straightforward productivity gain, the reality of review costs is coming as a big surprise.

The risk that’s emerged alongside this is what the industry has started calling ‘vibe coding’, shipping AI-generated code that feels right without the rigorous verification needed to know if it actually is. It’s an easy pattern to fall into when speed is being measured and review time isn’t, and it’s one of the more direct routes to a technical debt problem.

Amplification Works Both Ways

Review tax is a symptom of something deeper, which is that AI cannot evaluate the environment it’s working in. It takes what’s there and builds on it, which means the quality of what comes out is shaped by the quality of what went in.

“If you really have a good system, AI will amplify everything. If you really have a bad system, AI will amplify all the bad parts as well.”

– Giorgos Ampavis, Technology Leader & Advisor

That amplification dynamic is what makes governance an important prerequisite rather than an afterthought. Teams with clear requirements and well-documented systems find that AI tooling delivers closer to what was promised, while teams without those foundations are finding that AI compounds their issues as much as it does their gains.

Engineering 2028 data reflects how seriously leaders are taking this. 51% of respondents cite governance and regulatory complexity as the primary barrier to AI adoption, yet only 44% are confident they’ll have appropriate governance frameworks in place by 2028. That’s a significant gap, and it sits directly underneath the quality numbers. However, without the structures to govern what AI produces, the review tax keeps rising, and the 8% quality improvement figure starts to make a lot more sense.

From Gatekeeping to Facilitating

The current state for most teams is one of human oversight functioning as a checkpoint, a gatekeeper role where engineers review and approve AI output before it moves forward. It’s a necessary stage given where trust in AI-generated code currently is, but it’s also an inherently limited one, as the role of a gatekeeper is one that often creates friction within a process.

“Right now, the human is almost a gatekeeper. By 2028, we will get past that… towards humans as facilitators.”

– Dewi Rees, Senior Engineering Manager, Flagstone

The facilitator model is a more productive destination. Rather than standing at the end of the pipeline checking what came out, the engineer is shaping what goes in, setting the conditions, the constraints, and the quality standards that AI works within. That shift reduces the review burden because more of it happens upstream, and the nature of the human contribution moves from reactive to proactive.

Getting there requires investment in the foundations that make it possible. Requirements need to be clearer and more precise than they’ve historically been, because AI agents interpret them literally. Documentation needs to be treated as a live asset rather than an afterthought, because it’s increasingly what AI is drawing on to understand system intent. And technical debt needs to be addressed rather than accumulated, because every layer of it is another surface for AI to amplify in the wrong direction.

What the Quality Gap Tells You

The figures we’ve seen aren’t telling a story about AI failing to deliver, but rather about the conditions most teams are deploying AI into, and what happens when the tooling moves faster than everything else around it can.

The teams closing that gap aren’t necessarily using better AI, but they are being deliberate about what they’re asking it to work with. They’ve invested in cleaner systems, documentation, and governance frameworks that shape AI output before it ever reaches review rather than catching problems after the fact. The review tax is real, but it’s also very manageable.

For engineering leaders planning toward 2028, the question worth asking isn’t how much AI your team is using, but whether the systems your AI is working within are good enough to make that usage productively, and whether the humans overseeing it are positioned to orchestrate rather than just gatekeep.

The full Engineering 2028: Leading Human + AI Teams Responsibly report goes further into governance frameworks, the trust gap in AI-generated code, and how leaders are thinking about quality at scale. You can download it here.

Want to go deeper? We have two upcoming Bytes sessions diving even further into the findings of Engineering 2028: Leading Human + AI Teams Responsibly.

Our online Bytes session, The Maturity Roadmap: From Early Adoption to AI-Enabled Leadership, takes place on 23rd April and tackles the messy gap between AI experimentation and genuinely AI-enabled leadership, while our in-person Bytes, Engineering 2028: A Leadership Masterclass, takes place on 7th May in London and gets concrete about what mature human and AI orchestration actually looks like when teams move beyond the hype.

Both are free to attend and offer the chance to explore the findings in more depth and connect with peers who are navigating the same challenges.

Reviewer Economy: The Hidden Tax of AI-Generated Code

The Review Tax

Amplification Works Both Ways

From Gatekeeping to Facilitating

What the Quality Gap Tells You

Author

Oskar Ficek

Categories

Recent Posts