The quiet shift toward smaller, specialised agents

Introduction

Twelve months ago, almost every enterprise deployment we reviewed was a single assistant with a sprawling system prompt and ambitions to do everything from procurement to legal review. Today, the rooms we sit in have grown quieter, and the diagrams on the whiteboards have grown smaller.

By the numbers

64p

Median cost per answer, March 2026

8 → 80

Agents per deployment, twelve-month span

4x

Faster time-to-production for narrow scopes

One assistant, one job

The shift is not driven by fashion. It is driven by the three things that keep enterprise platform leads awake at night — evaluation, cost, and the awkward question of who is accountable when the model gets it wrong. Narrow agents help with all three at once, and the compromises are easier to articulate to a board.

A team running eight focused agents can tell you, honestly, which two of them are saving money and which six are merely interesting. The same team running one general-purpose assistant usually cannot.

Scope	Agents	Median latency	Pass rate
Intake triage	3	410 ms	96.4%
Knowledge retrieval	2	720 ms	92.1%
Compliance classify	1	290 ms	98.7%
Escalation route	2	180 ms	99.2%

Production traces from a UK retail bank, March 2026 rollout cohort.

What the orchestrator actually looks like

We hand out a tiny scheduler, not a framework. Four callables, a typed registry, and a policy that can say no. Most of the interesting questions then happen at the edges, which is where you want them.

orchestrator.ts

export async function route(
	req: AgentRequest,
	registry: AgentRegistry,
): Promise<AgentResult> {
	const plan = await policy.decide(req);
	if (plan.kind === "refuse") return { ok: false, reason: plan.reason };

	const agent = registry.resolve(plan.agentId);
	const result = await agent.run(req);
	await telemetry.record({ req, plan, result });
	return result;
}

What we are watching next

The interesting question for 2026 is not whether specialised agents work — they do ¹ — but what the connective tissue between them looks like when the list grows from eight to eighty ². That is the problem we are currently sitting with, and the answer has already stopped looking like a product and started looking like an operating model ³.

What the data shows

Cost per useful answer, twelve-month trailing

Indexed to December 2024, normalised across four client workloads.

Agent time to production, by scope

Mean weeks from first internal demo to a published SLA, six clients, 2025.

We used to have a model problem. Now we have an orchestration problem, which turns out to be a better problem to have.

Head of AI platform — UK retail bank

Where we land

We will keep writing these as we find them. If any of this lands close to a problem you are working on, the team is always happy to talk it through.

The quiet shift toward smaller, specialised agents

One assistant, one job

What the orchestrator actually looks like

What we are watching next

About this note