RUSI report finds AI safety evaluations riddled with security gaps that state-sponsored hackers can exploit
RUSI report finds AI safety evaluations riddled with security gaps that state-sponsored hackers can exploit

The security risks associated with this access (to sophisticated AI models) – from intellectual property leakage to model compromise to exploitation by state-sponsored actors – remain poorly mapped and inadequately standardised.

Louise Marie Hurel (Senior Research Fellow, RUSI)

The report, authored by Louise Marie Hurel, Elijah Glantz, and Daniel Cuthbert, focuses on a specific and under-examined problem: the security of the access granted to external evaluators when they assess new AI models for catastrophic risks such as supporting weapons development or automating cyberattacks. These evaluations are the main mechanism by which the industry tries to catch dangerous capabilities before models reach the public.

The central finding is that this mechanism has a serious security problem of its own. Access to frontier models varies enormously across jurisdictions, companies, and bilateral agreements. There is no agreed vocabulary, no common framework for how access is granted or revoked, and no standardised approach to credential management. The report introduces an Access-Risk Matrix as the first systematic tool for quantifying what different levels of access actually expose.

The risk gradations matter. The report notes that even low-level access can be used for reconnaissance — attackers do not need the keys to the vault to begin mapping the system. Write access to model internals is identified as particularly dangerous, because it could allow an adversary to corrupt a model's behaviour invisibly and permanently, without triggering any observable alarm.

Credential theft and legacy access rights — neither of which are exotic AI-specific threats — feature prominently in the findings. These are well-understood cybersecurity risks that the AI safety ecosystem has, by the report's account, failed to address adequately. The absence of standardised revocation procedures and access monitoring means that evaluators who no longer need access may retain it indefinitely.

The report calls for three coordinated responses: a shared vocabulary that allows developers, evaluators, and governments to work from the same reference points; standardised frameworks governing how access is granted, monitored, and revoked; and structured dialogue between the AI safety and cybersecurity communities, which currently operate in parallel without sufficient crossover.

The researchers draw on three workshops involving 39 international experts from AI labs, civil society, government, and cybersecurity, and focus specifically on closed frontier models in pre- and post-deployment phases.

More News