Compare the Cloud Field Notes

Infrastructure 03

Inference at the edge of what we can afford

A year of cost telemetry across four UK deployments, and the pattern that changed how we size clusters.

By Priya Banerjee
Published 14 January 2026
Reading time 9 min read
Topic Infrastructure

Introduction

We started the year with a simple question — is our fleet sized right — and ended it with a rather more awkward one — what does 'right' even mean when the workload changes shape every two weeks.

By the numbers

14k hrs

Observed inference telemetry, 2025

70%

Cost reduction not from a better model

P95 → P99

We now size to a budgeted tail, not peak

Inference at the edge of what we can afford
A hyperscaler hall we walked during capacity planning week.
Eight hours of rack telemetry compressed into a fifteen-second loop.

What the telemetry told us

Across four clients and roughly fourteen thousand hours of inference, the utilisation graphs told a consistent story. The cost-per-useful-answer curve was dominated not by model choice but by how aggressively we batched, cached and routed around slow paths. The biggest wins came from boring, unglamorous infrastructure work.

How we size now

We have stopped sizing to peak and started sizing to a budgeted tail. It is less tidy on a capacity chart and more honest about what the business is actually buying. The finance team, for what it is worth, prefers it.

What the data shows

GPU utilisation, tail versus peak

00040810121416182023020406080100HOUR OF DAY
Mean utilisation curve across four clients, weekday workloads, March 2026.

Cost contribution by control, not by model

342718129BatchingCachingRoutingFallbackModel swap010203040LEVER
Share of the total cost reduction attributed to each infrastructure lever, 2025.
Inference at the edge of what we can afford
Fibre trunk routing through the hot aisle of a UK datacentre.
Inference routing map, taken the week we rebuilt the scheduler.
Our cost curve flattened the week we stopped trying to make one agent do everything.
Principal engineer — Public sector body
Inference at the edge of what we can afford
One of the boards we retired this quarter after a budgeted-tail rewrite.

Where we land

We will keep writing these as we find them. If any of this lands close to a problem you are working on, the team is always happy to talk it through.