7 levels of AI-assisted development
In 2025, we saw the phrase “coding agent” become an almost ubiquitous way to talk about AI-assisted software development, largely driven by marketing and hype. It is a term that has been used to describe everything from IDE Copilots to fully autonomous systems capable of modifying, testing, releasing, and changing products with minimal human input.
For practitioners navigating their way through this fast-moving technology, the term “coding agent” or “agent” more broadly is not especially helpful.
It collapses a wide range of fundamentally different ways to build products with AI into a single phrase, while obscuring the decisions that actually matter when teams are adopting AI.
The challenge is not usually deciding whether or not to adopt AI tooling or agentic workflows within a team, rather it is deciding how AI should participate in the development workflow, how much autonomy is appropriate, and which responsibilities remain human-owned.
Those decisions are shaped by context, including: risk appetite, team maturity, regulatory exposure, and whether the organisation still treats software development as a craft or has consciously accepted code as a commodity (for clarity, we are not suggesting that an industry-wide shift has happened whereby code is indeed a wholesale commodity, but in some operating contexts, it could be).
Seen through that lens, AI-assisted development is not a binary choice or a single category of tooling. It is a spectrum of operating modes, each trading increased leverage against maintainability, code quality, human knowledge work, determinism, and our ability to reason about the systems we build.
This post explores what those levels look like in practice and how teams can use them to make deliberate adoption choices.
The spectrum of adoption
We have found it useful to think in terms of adoption levels for AI-assisted development, ranging from lightweight, human-driven assistance to fully spec-driven systems where implementation is almost entirely abstracted away.
Level 1: Manual prompting (no agency)
Copying snippets into ChatGPT (or similar tools) and pasting the output back into your codebase. This is entirely human-driven, with low automation and low risk.
Level 2: IDE-level assistance (no agency)
Inline completions, refactors, or larger snippet generation directly inside the editor. Productivity improves, but intent, validation, and integration remain firmly human-owned.
Level 3: Autonomous codebase access (limited agency)
Granting a model access to the repository (and often a shell), allowing it to decide what changes to make within a broader workflow that includes tests, type checks, and sometimes commits.
Level 4: Autonomy with deterministic tooling (capable agency)
Extending level three by connecting the model to deterministic tools and systems: documentation sources, build pipelines, issue trackers, or MCP servers. Parts of the workflow become predictable and repeatable, even if decision-making is not.
Level 5: Autonomy grounded in engineering context (grounded agency)
Building on level four by explicitly anchoring the model in your team’s standards, architectural principles, and operating constraints. Artefacts such as AGENTS.md, documented conventions, and workflow guardrails become first-class inputs rather than implicit knowledge.
Level 6: Specialised, parallelised LLM workflows (scaled agency)
Extending level five with vendor-specific capabilities: sub-agents focused on security, accessibility, or QA; git worktrees enabling parallel execution; and higher degrees of standardisation across large codebases.
Level 7: Spec-driven development (full agency)
Treating code as a commodity and elevating specifications and prompts to the primary artefact. Implementation becomes an interchangeable output rather than a durable asset.
Each step up this spectrum increases leverage. It also increases the surface area for error, misalignment, and accidental complexity.
Why most teams should be aiming for level five
Describing a software system as an “agent” with “agency” suggests independence, judgment, and delegated responsibility. In most product engineering organisations, these attributes carry significant weight and consequence, we rarely assign them lightly, even to people. Software, however, cannot own outcomes, be held accountable, or meaningfully participate in governance. It executes instructions; it does not make commitments.
Within this context, level five represents a pragmatic balance. It accelerates the velocity of individual contributors and increases throughput, while still preserving the organisations ability to reason about, validate, and safely evolve toward intended outcomes. Crucially, it avoids a frequent failure mode: optimising for the production of code rather than the delivery of value.
Pushing beyond this point often introduces disproportionate risk. As autonomy increases, systems begin to generate work independently and at scale. Throughput rises, but the organisations ability to understand, govern, validate, and integrate what has been produced does not scale at the same rate. The result is compounding product, architectural, and technical risk.
At Level Five:
- Automation materially improves developer experience by reducing boilerplate, minimising context switching, and shortening feedback loops.
- Engineering standards, architectural intent, and team conventions remain explicit, visible, and governed.
- Accountability for product outcomes remains with people.
- Validation, review, and integration are allowed to scale proportionately with output.
The autonomy paradox
Level seven, spec driven development, might feel counterintuitive sitting so deep in a maturity ladder. By the time teams fully embrace spec-driven development, they have explicitly accepted that code itself is no longer the primary artefact. Instead, specifications, constraints, and prompts become the durable artefacts, with implementation treated as transient and easily regenerated.
For those unfamiliar with Spec Driven Development, it is an approach where the specification becomes the primary, durable artefact and the source of truth, while code is treated as an interchangeable output generated from it. Tooling such as Spec Kit by GitHub is at the frontier of this approach.
Instead of engineers focusing on line-by-line implementation, teams invest in creating rich, precise specifications that capture intent, constraints, architecture, quality expectations, and acceptance criteria. AI systems interpret these specifications to produce and continuously regenerate working software, while governance, validation, and accountability shift “upstream” to the specification layer rather than the codebase.
As a result, many of the intermediate layers that mattered at earlier levels, workflow tooling, guardrails, and even some forms of review collapse because the unit of value is no longer the implementation itself.
In that sense, level seven may actually be closer to level 0, a reset rather than a linear progression. Tools such as v0 or Lovable arguably sit at level -1, removing many traditional engineering concerns entirely in favour of rapid, disposable output.
Treating code as a commodity is not inherently wrong, but it represents a fundamentally different operating model, with distinct economics, risks, and organisational implications. Treating it as a natural extension of “coding agents” obscures the fact that teams are no longer optimising the same system.
Extracting lasting value
As this new technology permeates every part of the product engineering toolchain, the risk for teams is adopting AI-assisted development without being explicit about where autonomy stops, how accountability is retained, and what outcomes the system is optimised for.
For most teams, durable value does not come from maximum automation. It comes from intentional constraint, explicit intent, and clear ownership.
AI-assisted development is not magic. It is a powerful set of tools operating inside a broader socio-technical system of people, processes, and incentives.
The organisations that extract lasting value will be those that treat AI-assisted development not as a race for throughput, but as an operating capability to be deliberately shaped, bounded, and aligned with long-term product responsibility.
