Compare the Cloud Field Notes

Field note 01

The quiet shift toward smaller, specialised agents

Enterprise pilots across the UK are quietly abandoning monolithic assistants in favour of narrow, composable agents. Here is what we are seeing, and why the pattern has legs.

By Anna Whitfield
Published 18 March 2026
Reading time 8 min read
Topic Agents , Strategy

Introduction

Twelve months ago, almost every enterprise deployment we reviewed was a single assistant with a sprawling system prompt and ambitions to do everything from procurement to legal review. Today, the rooms we sit in have grown quieter, and the diagrams on the whiteboards have grown smaller.

By the numbers

64p

Median cost per answer, March 2026

8 → 80

Agents per deployment, twelve-month span

4x

Faster time-to-production for narrow scopes

The quiet shift toward smaller, specialised agents
Production telemetry from a UK retail bank rollout, week 14 of observation.
Live agent routing topology, twelve-hour loop.

One assistant, one job

The shift is not driven by fashion. It is driven by the three things that keep enterprise platform leads awake at night — evaluation, cost, and the awkward question of who is accountable when the model gets it wrong. Narrow agents help with all three at once, and the compromises are easier to articulate to a board.

A team running eight focused agents can tell you, honestly, which two of them are saving money and which six are merely interesting. The same team running one general-purpose assistant usually cannot.

ScopeAgentsMedian latencyPass rate
Intake triage3410 ms96.4%
Knowledge retrieval2720 ms92.1%
Compliance classify1290 ms98.7%
Escalation route2180 ms99.2%
Production traces from a UK retail bank, March 2026 rollout cohort.

What the orchestrator actually looks like

We hand out a tiny scheduler, not a framework. Four callables, a typed registry, and a policy that can say no. Most of the interesting questions then happen at the edges, which is where you want them.

orchestrator.ts
export async function route(
	req: AgentRequest,
	registry: AgentRegistry,
): Promise<AgentResult> {
	const plan = await policy.decide(req);
	if (plan.kind === "refuse") return { ok: false, reason: plan.reason };

	const agent = registry.resolve(plan.agentId);
	const result = await agent.run(req);
	await telemetry.record({ req, plan, result });
	return result;
}

What we are watching next

The interesting question for 2026 is not whether specialised agents work — they do ¹ — but what the connective tissue between them looks like when the list grows from eight to eighty ². That is the problem we are currently sitting with, and the answer has already stopped looking like a product and started looking like an operating model ³.

What the data shows

Cost per useful answer, twelve-month trailing

AprMayJunJulAugSepOctNovDecJanFebMar050100150MONTH
Indexed to December 2024, normalised across four client workloads.

Agent time to production, by scope

229754GeneralTriageRetrievalClassifyRoute0510152025SCOPE
Mean weeks from first internal demo to a published SLA, six clients, 2025.
The quiet shift toward smaller, specialised agents
Agent evaluation harness dashboard, anonymised.
Evaluation harness replaying eight concurrent test suites.
We used to have a model problem. Now we have an orchestration problem, which turns out to be a better problem to have.
Head of AI platform — UK retail bank
The quiet shift toward smaller, specialised agents
Three months of orchestration traces, rendered as a sparkline cluster.

Where we land

We will keep writing these as we find them. If any of this lands close to a problem you are working on, the team is always happy to talk it through.