A year after pushing their teams to embrace generative AI, many companies are pulling in the opposite direction, auditing usage, capping prompts, and penalising employees who spend heavily on AI tools. The practice has acquired a name: token-shaming.
Average enterprise AI spend hit $7 million in 2026, nearly three times initial projections, according to industry research cited by nexos.ai, a Vilnius-based AI platform backed by Index Ventures and Creandum. For small and medium-sized businesses operating on tighter margins, that trajectory has turned the innovation push into a budget problem without an obvious fix.
The response at many organisations has been blunt: usage caps, monthly token audits, and uncomfortable questions during performance reviews about why a project consumed more than expected. The behaviour companies encouraged, namely exploring AI boundaries, is now the behaviour they penalise.
A related phenomenon at some companies goes the other direction. Internal leaderboards tracking AI usage have appeared, with high consumers celebrated and, in some cases, minimum token quotas attached to performance expectations. Analysis cited in the nexos.ai report suggests high-token workflows tend to generate more code but also more rework, higher costs, and diminishing returns on actual output.
Žilvinas Girėnas, head of product at nexos.ai, said: "The innovation paradox shows up when companies cut costs without a clear plan. Most teams react to high AI bills by adding restrictions instead of building smarter systems, which forces employees to choose between innovating and staying within budget. The real problem is the absence of infrastructure that optimizes token use in real time without taking autonomy away from the team. Once teams can route tasks to the right model, sharpen prompts, and track costs automatically, heavy-handed restrictions stop being necessary."
The proposed remedy is intelligent model routing: directing each request automatically to the most cost-effective model capable of handling it. Using a premium frontier model to summarise meeting notes can cost 25 times more per task than a lighter model with no measurable difference in output quality, according to nexos.ai's own figures. Gartner projects that by 2028, half of all GenAI projects will exceed their budgets due to poor architecture and insufficient cost optimisation.
For organisations that already deploy three or more model families in testing or production, which Gartner estimates is now 81 percent of enterprises, routing removes the decision from employees and builds it into the system itself.