What the first real data on agentic software development tells us — and what it means for how teams should be working in 2026.

1. The most quoted law in software is still standing

“Good, fast, cheap. Pick two.”

It is the most cited constraint in project management, attributed without much evidence to the Victorian critic John Ruskin and formalised in 1969 by Dr. Martin Barnes in his PhD thesis at UMIST. Since then it has appeared in every project management textbook, every agency pitch deck, and every founder’s first conversation with their first engineering hire. It has survived waterfall, agile, lean, no-code, offshore, nearshore, and three full hype cycles of cloud.

The promise of AI-assisted development was that this law would finally crack. That code generation, with a sufficiently capable model, would collapse the trade-off and let teams ship faster, better, and cheaper at the same time. For two years that promise has been the dominant narrative in every founder forum, vendor demo, and LinkedIn feed.

The first real data on whether the promise held arrived in 2025. It did not say what the industry expected it to say.

This essay walks through that data, explains why it lands the way it does, and argues that the iron triangle is doing exactly what it has always done — punishing the teams who operate without discipline, and rewarding the teams who do. The technology changed. The law didn’t.

2. The data on naive AI adoption is brutal

Three independent bodies of evidence have now landed, and they tell a remarkably consistent story.

The METR randomised controlled trial. In July 2025, the Model Evaluation and Threat Research group published the first proper RCT on AI coding tools. Sixteen experienced open-source developers worked on 246 tasks across their own mature codebases, each task randomly assigned to “AI allowed” or “AI disallowed.” The tools available were the frontier of the moment — Cursor Pro with Claude 3.5 and 3.7 Sonnet. Before the study began, the developers forecast that AI would cut their task time by 24 percent. After the study, they reported it had cut their time by 20 percent. The actual measurement showed they were 19 percent slower when allowed to use AI. A 39-point gap between perception and reality, in the population of developers most invested in believing AI helps them.

METR later revised aspects of the study design after selection effects emerged in a follow-up cohort. The direction softened — their February 2026 update concluded AI “likely provides productivity benefits in early 2026” — but the perception gap, and the core observation that naive AI use is far less productive than its users believe, has not been overturned.

The Faros AI telemetry studies. Faros AI subsequently analysed real engineering telemetry from over 10,000 developers across 1,255 teams, and later expanded to 22,000 developers across two years. This is not survey data. It is commit logs, pull request records, deployment events, and incident reports — the actual evidence of work being done. The pattern is consistent:

Teams with high AI adoption complete 21 percent more tasks
They merge 98 percent more pull requests
Their average PR size grows by 154 percent
PR review time grows by 91 percent
Bugs per developer rise by 9 percent
Organisation-level DORA metrics — deployment frequency, lead time, change failure rate, mean time to recovery — show no significant correlation with AI adoption

More code is shipped. More tasks are checked off. More PRs are merged. The system as a whole does not move.

Stack Overflow’s 2025 developer survey. Among developers using AI, 66 percent report their biggest frustration is that AI solutions are “almost right, but not quite.” Another 45 percent report that debugging AI-generated code takes longer than writing it themselves would have. Favourable sentiment toward AI tools dropped from over 70 percent in 2023 and 2024 to 60 percent in 2025, even as adoption climbed to over 80 percent. Developers are using these tools more, and trusting them less.

Read those three datasets next to each other and a coherent story emerges. The coding step accelerated. The system did not. Something is absorbing the gains before they reach delivery.

That something is the part of the argument that matters.

3. The bottleneck didn’t disappear. It moved.

For sixty years the binding constraint in software was execution capacity. Writing the code. Every methodology we developed — sprints, story points, capacity planning, the whole industrial apparatus of agile — was an attempt to manage scarcity at that one point in the pipeline. We built a discipline around the assumption that the slowest step would be a human typing.

When agents arrived, that step stopped being scarce.

But the law of bottlenecks does not say a system has no constraint. It says a system has one constraint at a time, and removing it surfaces the next one. Amdahl’s Law makes the maths of this unforgiving: even a 100 percent improvement in one stage of a pipeline only yields a 15–25 percent system improvement if the other stages don’t move with it.

The Faros data shows where the constraint went. Review time grew by 91 percent. Bug rates grew by 9 percent. PR size grew by 154 percent. The bottleneck moved downstream — from writing code to proving the code works.

That is the surface story. The deeper story is that the real bottleneck moved further upstream still, to a point most teams aren’t even looking at.

When a human developer reads a vague spec, they fill the gaps from context. They know the codebase. They know what the PM probably meant. They know what the customer is actually going to do with the feature. They make a reasonable interpretation, write reasonable code, and the result is usually close enough to ship.

When an AI agent reads a vague spec, it fills the gaps from probability. It generates the most statistically likely interpretation of the words on the page. Not the most contextually correct one. Not the one the customer wanted. The most likely one, given a training distribution that has no idea what your business actually does.

This is why review times exploded by 91 percent. The PRs aren’t wrong. They’re plausible. Plausible code is harder to review than obviously wrong code, because plausible code requires the reviewer to reconstruct the original intent and check the implementation against it — line by line, decision by decision. That reconstruction work used to live in the developer’s head, where it was free. Now it lives in the reviewer’s head, where it is expensive.

The bottleneck moved from execution capacity to intent fidelity — the precision with which a human can transmit what they want into a form an agent can execute.

This is not a tooling problem. No model upgrade fixes it. It is a methodology problem, and it is the most underpriced lever in software development right now.

4. Why the iron triangle is still standing — and what makes it bend

If you accept that the bottleneck has moved to intent fidelity, the iron triangle stops being mysterious.

Teams that adopted AI tools and changed nothing else in their methodology are getting exactly what the law predicts. They are buying speed at one stage of the pipeline by paying for it at another stage. The trade-off didn’t dissolve. It moved.

They got faster code generation, and paid for it in slower review.
They got cheaper execution, and paid for it in more bugs reaching production.
They got more output, and paid for it in flat delivery metrics.

Good, fast, cheap. Pick two. The law is doing its job.

What makes the triangle bend is not the technology. It is the precondition for using the technology well. Teams that produce sharp specifications before letting an agent execute don’t just get faster code — they get code that doesn’t require 91 percent more review, because the spec already constrained what the agent could produce. They don’t get 9 percent more bugs, because the spec already defined what correct behaviour looks like. They don’t ship 154 percent larger PRs, because the spec already decomposed the work into scoped, reviewable units.

Empirical work supports this. A peer-reviewed study presented at ICSE 2026 found that incorporating architectural documentation substantially improved LLM-assisted code generation across functional correctness, architectural conformance, and code modularity. Smaller controlled studies have shown error reductions of up to 50 percent when AI agents work from human-refined specifications versus ad hoc prompts.

When the spec is sharp enough, each side of the triangle responds:

Good — because the agent has no room to drift from defined behaviour
Fast — because review collapses when the artifact matches the spec
Cheap — because the most expensive bug is the one you wrote into the spec and discovered in production

This is what every team breaking through Faros’ baseline is actually doing, whether they have a name for it or not. Some call it spec-driven development. Some call it contract-first engineering. Some call it specification-led AI. The labels matter less than the underlying move: shifting the centre of gravity of the development cycle from execution to definition.

5. Five laws that hold when you build this way

Spec-driven development is not a tool, a framework, or a vendor. It is a disciplined response to the bottleneck having moved. The teams who operate well in this new equilibrium tend to converge on the same handful of principles, regardless of stack, domain, or team size.

Law 1: The spec is the product. Everything else is execution.

Once an agent can build anything you can describe precisely, the artefact of value stops being the code and starts being the description. The spec is what carries the business intent, the constraints, the success criteria, the non-goals. The code is downstream of it. A team that treats the spec as documentation will continue to treat its agents as faster typists. A team that treats the spec as the product will reorganise around making it as sharp as possible, because the spec is now the lever that pulls all three sides of the triangle at once.

Law 2: Ambiguity is no longer a cultural quirk. It is a production defect.

When humans wrote the code, vague requirements produced friction — questions, rework, missed deadlines. The team absorbed the loss and shipped something reasonable. When agents write the code, vague requirements produce shipping defects. The agent does not stop to ask what you meant. It produces the most probable interpretation and presents it as correct. The cost of ambiguity used to be measured in sprints. It is now measured in production incidents. Teams that have internalised this stop tolerating loose specs the way mature teams stopped tolerating untested code.

Law 3: The bottleneck is imagination, not execution.

Engineering capacity used to be the scarce resource. It isn’t anymore. What is scarce is the ability to imagine, with precision, what should be built — to articulate intent in a form that survives the journey from human idea to machine execution without losing fidelity. Senior product thinking, domain expertise, architectural taste, and the discipline to define edge cases before they become incidents — these are now the binding constraints. Teams that are still hiring purely for execution capacity are optimising the wrong scarce resource.

Law 4: Most software fails before a line of code is written.

The traditional view is that software fails in implementation — bugs, missed deadlines, technical debt. The data from the last two years suggests this was always wrong, and is now obviously wrong. Software fails in the specification, or in the absence of one. The implementation phase only surfaces failures that were already encoded upstream. Agentic development makes this visible in a way it never was before, because the agent’s literal execution exposes every gap in the spec as a defect, where a human would have closed the gap silently from context.

Law 5: Speed comes from precision upstream, not parallelism downstream.

The first instinct of teams seeing the productivity gains from AI is to do more in parallel — more PRs, more workstreams, more concurrent features. Faros’ data shows exactly where that road ends: 47 percent more PRs touched per developer per day, 91 percent longer review times, no improvement in delivery throughput. The speed those teams were chasing existed, but it was being absorbed by the downstream coordination cost of more work in flight. The teams who actually move faster are the ones who slow down at the spec stage, get the definition right, and then let the execution layer move at the speed it is now capable of moving. Precision upstream is the only durable form of speed.

These five laws are not exhaustive, and they will evolve. They are the working principles that have emerged from operating in this environment, not received wisdom from a framework. Read them as a starting point for how to think about the shift, not as a checklist for adopting it.

6. What this means for how you build now

The teams that will compound advantage over the next three years are not the ones with the best agent stack. The agent stack is converging — frontier capabilities will be roughly evenly distributed, the tools will look similar across vendors, and the cost of execution will continue to fall toward zero. None of that will be a moat.

The moat is the discipline to operate well in the new equilibrium. Specifically:

The discipline to define what you want before letting machines build it at speed
The discipline to invest in spec quality the way mature teams once invested in test coverage
The discipline to recognise that the bottleneck has moved, and to redesign the team, the process, and the artefacts around the new constraint rather than the old one
The discipline to measure system-level outcomes — delivery throughput, defect rates, lead time — rather than the activity metrics that AI tooling makes look impressive while hiding the regression

What this looks like in practice varies by team. For some, it means a single product manager who writes specifications at a level of detail no PM was writing them a year ago, because the spec is now executable by agents. For others, it means a small senior team that ships at the pace of a much larger team because the spec layer has absorbed the coordination work that used to require headcount. For others still, it means rethinking the entire build cycle so that documentation, onboarding, GTM assets, and customer success guides are produced from the same spec that produced the code — because if the spec is sharp enough to build from, it is sharp enough to write everything else from too.

The throughline is the same in every case. The teams who win the next phase of software development are not the ones who got the agents to work. They are the ones who got the humans around the agents to work — who realised that the scarce resource changed, and changed their methodology to match.

The iron triangle is still standing. It has stood for sixty years. It is not going to fall to a tool. It bends, as it has always bent, to the teams who do the unglamorous upstream work that the law has always rewarded.

That work has a new name now. The work itself is older than software.

This essay is part of an ongoing series on spec-driven development and the operational shifts agentic software is forcing in product and engineering teams. Sources for all data cited: METR (2025, 2026); Faros AI (2025, 2026); Stack Overflow Developer Survey (2025); ICSE 2026 proceedings on architectural documentation in LLM-assisted code generation.

1. The most quoted law in software is still standing

2. The data on naive AI adoption is brutal

3. The bottleneck didn’t disappear. It moved.

4. Why the iron triangle is still standing — and what makes it bend

5. Five laws that hold when you build this way

6. What this means for how you build now

More from methodology

I’m a Dead PM. AI Killed Me.

The MVP Is Dead. Vibe Coding Killed It. But Not For the Reason You Think.

You Don’t Have an Architecture Problem. You Have an Artifact Problem.