#engineering #design - 10 mins read

Building design systems at scale with AI

I recently gave the talk AI Native Engineering: Building Design Systems at Scale with AI at Practically AI meetup in Manchester. The discussion that followed made it clear that many large enterprise organisations are asking the same structural question.

How do large technology organisations use AI to reduce the cost and complexity of managing vast, fragmented web estates while increasing consistency, accessibility, and user experience?

In 2026, producing high-quality visual output underpinned by code from AI at a local, individual level is largely a solved problem. What remains difficult is replicating that quality across dozens or hundreds of applications without compounding structural inefficiency.

Enterprises are increasingly turning to AI to address inefficiency, duplication, and inconsistency across their frontend estates, seeking to create genuine economies of scale while reducing structural complexity and long-term cost.

The solution is no different from how organisations achieve economies of scale at the infrastructure or functional layers of the stack. It requires deliberate investment in platform capability. At the frontend layer, that investment typically manifests as a governed, reusable design system that standardises patterns, reduces duplication, and lowers the marginal cost of change.

Design systems are currently as prevalent in conversation as AI capabilities. Yet many organisations continue to struggle with adoption, sustained rollout, or, in some cases, the absence of a design system altogether.

This gap has led design and engineering leaders to view AI as a substitute for the foundational platform work they never fully realised.

This article draws on our experience to outline how AI can accelerate an organisation’s adoption of design systems through AI-native enablement. It also explains why open-ended AI generation fails at enterprise scale, and how a constrained, AI-native stack provides durable, compounding leverage.

Why open-ended generation breaks at scale

Frontier models are now highly capable of generating zero-shot, design-intent-aligned React applications. As a result, many organisations default to expansive, highly detailed prompts and markdown specifications as their primary mechanism for control and consistency as they begin experimenting with UI-to-code generation with AI.

This approach appears effective at the local product level, however, we often see it degrade rapidly at scale.

As prompts expand to accommodate additional rules, edge cases, and design nuance, output variability increases. Consistency declines as the component surface area grows. Natural language, even when structured, remains an imprecise way of expressing design intent.

Several realities of how AI works in practice contribute towards this:

Context degradation
Long contexts degrade in the middle. Research such as Lost in the Middle (Nelson F. Liu et al.) demonstrates that models struggle to retrieve information embedded within extended prompt files.

Diminishing returns
Adding more context yields marginal gains. Empirical research, including work from ETH Zurich, suggests quality improvements plateau, and in some cases significantly degrade, when context becomes overloaded.

Cost scaling
Inference cost scales linearly with token volume. At enterprise usage levels, expanding prompt size becomes commercially material.

Probabilistic outputs
Every LLM response is probabilistic. Prompts and specifications are inherently incomplete, meaning output fidelity can never be fully deterministic.

Engineering an AI-native design system

If natural language is inherently lossy, the solution is to constrain the system.

AI must operate inside defined architectural and design guardrails rather than as an autonomous agent. In a mature AI-native design system, the agent can not sufficiently rely primarily on descriptive English guidance. It must instead interact directly with platform-level tooling and structured artefacts.

Transitioning from a “prompt-driven” approach to a “system-driven” approach requires deliberate structural investment in a number of platform level capabilities:

1. Standardise the composable UI baseline

Ambiguity must be reduced at the component contract layer before AI generation begins.

Standardising on a robust, composable baseline establishes rigid patterns for composition. Bootstrapping the system with human-crafted, production-grade "golden baseline" components creates explicit boundaries for the AI to follow.

This might manifest as a handful of design system components, implemented in your UI framework of choice, by a human engineer.

2. Programmatic design integration

Instead of describing visual intent in plain English, design tooling should be integrated directly into the development workflow.

By exposing design tokens, variables, and canonical component definitions programmatically, AI consumes structured data rather than English instructions. This significantly reduces reliance on often ambigous language as the primary interface between design and implementation.

In this model, the design system becomes executable and verifiable context.

3. Treat context as infrastructure

Manual context files such as AGENT.md not only marginally increase output quality, they also carry a maintenance overhead that renders them insufficient at enterprise scale.

AI requires structured access to live context. In the context of a design systems, this means giving your AI access to production component implementations. By exposing canonical component catalogues, prop definitions, state permutations, and usage examples directly from the codebase, generation is anchored to production-aligned constructs.

4. Automated backpressure and validation

An AI-native stack must include backpressure.

Simply put, backpressure is automated validation, such as linting, strict type constraints, structural tests, accessibility checks, and behavioural assertions, that sit directly inside the generation loop, providing the AI with feedback on its changes and guiding it in the right direction through deterministic tooling. Invalid outputs are rejected before reaching engineers or designers.

Architecting for determinism

Embedding AI within this architecture significantly increases the probability of generating usable, zero-shot scaffolding that engineers can safely extend.

The objective is not maximal autonomy or unbounded output generation, but rather structured acceleration.

AI should produce deterministic boilerplate and constrained implementation scaffolding, enabling specialist engineers to focus on complex state management, interaction logic, and product differentiation.

Putting this in practice

A constrained, AI-native design system operates as an integrated stack with AI embedded inside deterministic infrastructure.

A representative enterprise configuration might include:

Composable UI baseline
A standardised component foundation (for example, shadcn-style primitives layered on top of a utility-first styling system such as Tailwind). This provides consistent APIs, predictable composition patterns, and CLI-based scaffolding.

Design token infrastructure
Figma as the canonical design environment, exposing structured tokens, variables, and component metadata via Model Context Protocol (MCP) and code binding mechanisms.

Executable component catalogue
Storybook as the living, versioned source of truth for components. An MCP interface exposes prop definitions, state permutations, accessibility behaviour, and usage examples directly from production code.

Typed application layer
Strict TypeScript configurations, schema validation, and API contracts that constrain generation to known, enforceable interfaces.

Automated enforcement layer
Linting, accessibility testing (e.g., axe), structural and behavioural tests, and build-time validation integrated directly into the AI generation loop.

Context orchestration layer
Lightweight skill files that reference these structured artefacts rather than embedding extensive English language. Skills function as connectors to living system context.

In this stack, AI composes UI architecture from deterministic primitives inside a bounded environment. Because the model reads live artefacts, the context evolves automatically alongside the platform, reducing maintenance overhead to near zero.

Takeaways

Put AI on rails
When fidelity and architectural integrity matter, constrained workflows systematically outperform open-ended generation.

Invest in the baseline
Production-grade primitives, stable APIs, and structured design tokens are non-negotiable. AI amplifies what already exists.

Ground generation in living artefacts
Replace long English specifications with programmatic access to executable catalogues and typed interfaces.

Default to determinism
Use AI where it provides leverage, but do not anchor core platform stability to probabilistic systems.

Investing in your design system as both a platform capability and a context infrastructure materially accelerates adoption across the enterprise. It democratises the ability to produce production-grade UI while providing high-quality, reusable artefacts that can be consumed with minimal incremental investment from product teams.