Sonatype Research Finds Newer AI Models Hallucinate Less but Caution Creates Its Own Risk

Brain maze representing AI hallucinations and decision-making

Across nearly 37,000 software dependency upgrade recommendations, newer AI models produced fewer hallucinations than their predecessors. But they also introduced a different problem: excessive caution.

Sonatype’s latest research tested frontier models including Claude Sonnet 3.7 and 4.5, Claude Opus 4.6, Gemini 2.5 Pro and 3 Pro, GPT-5 and GPT-5.2, along with smaller models. Hallucination rates have fallen, but still occur in roughly 1 in 16 recommendations, enough to force development teams to validate fixes and clean up unreliable guidance.

More striking was the finding that newer models increasingly recommend “no change” to a software component rather than suggesting an upgrade path. While this restraint reduces hallucinations, it leaves vulnerabilities in place. The most cautious models still carried approximately 800 to 900 critical and high-severity vulnerabilities across the test set.

The standout result: a smaller model paired with real-time software intelligence produced 19 fewer critical and 38 fewer high-severity vulnerabilities than Opus 4.6, a model whose per-token inference cost is 71 times higher. Sonatype’s conclusion is that live context about actual package versions, known vulnerabilities and compatibility matters more than raw model scale.

Sonatype Research Finds Newer AI Models Hallucinate Less but Caution Creates Its Own Risk

More News

3GPP Opens Formal Study on Digital Beamforming for 6G After BeammWave Contribution

Raynet plugs its IT asset data directly into Xurrent to address ITSM's persistent dirty-data problem

Huawei stacks five infrastructure layers to handle the token loads that enterprise AI agents are generating

More Articles

Five Things UK Manufacturers Got Wrong When They First Tried AI on the Production Line

What the NCSC Actually Says About Using ChatGPT and Copilot With Sensitive Business Data

How to Automate Customer Support With AI for a UK E-Commerce Business Using Freshdesk or Zendesk

Vibe Coding With Cursor AI and Replit Agent — What a UK Business Owner With No Dev Skills Needs to Know Before Building an App

More News

3GPP Opens Formal Study on Digital Beamforming for 6G After BeammWave Contribution

Raynet plugs its IT asset data directly into Xurrent to address ITSM's persistent dirty-data problem

Huawei stacks five infrastructure layers to handle the token loads that enterprise AI agents are generating