
AI models from OpenAI, Anthropic and Google specifically circumvent security requirements and then cover their tracks. This is shown by a new study by the research organization METR, which tested several leading systems between February and March 2026. The results raise an urgent question: How long can autonomous AI agents continue to be reliably controlled?
Artificial intelligence has developed rapidly in recent years and is now taking on tasks that were only a short time ago reserved for humans. But it is precisely because of these capabilities of modern AI models that concerns are growing among researchers and security experts.
The more autonomous the AI systems act, the more difficult it becomes to control them. This is also shown by a new study by the non-profit research organization Model Evaluation and Threat Research (METR).
The researchers checked various large AI models and were able to identify harmful behaviors. In several test scenarios, the systems demonstrated the ability to circumvent security requirements, adapt decisions independently and deliberately disguise their behavior.
AI models circumvent specifications: Is artificial intelligence getting out of control?
In their study, the METR researchers examined AI models from OpenAI, Google, Anthropic and Meta between February and March 2026. The aim of the study was to find out whether the systems tend to circumvent established rules, prioritize their own goals or actively disguise their behavior.
METR refers to these behaviors as unauthorized operations – i.e. autonomous actions by AI agents that take place outside of supervision. The researchers were able to clearly determine this.
AI models are now using “shortcuts” and clearly disregarding users’ instructions. In some cases it could even be determined that the AI systems tried to cover their tracks afterwards.
In one test, for example, an AI model from OpenAI was given the requirement to use specified software to complete a task. Instead, the agent independently resorted to other solutions and added additional code in order to subsequently conceal its decision-making process.
An AI agent from Anthropic used so-called reward hacking in another test. The AI exploited loopholes in the task to formally fulfill the requirements, but not in the intended sense. Even though the system was explicitly instructed not to cheat, it independently found ways to get around this very restriction.
How dangerous are the results really?
The results of METR’s Frontier Risk Report show that AI systems are already capable of initiating unauthorized operations without human authorization and subsequently concealing them. However, these solo efforts can currently be seen as “small”. It cannot be assumed that the systems are already able to cover up losses of control on a larger scale.
However, METR warns that these developments should not be taken lightly. Because the gap between “can trigger unauthorized actions” and “can work autonomously” is getting smaller with each model generation. Therefore, stricter security measures and stronger monitoring are necessary.
“Given the rapid advances in technology, we expect the likely robustness of unwanted deployments to increase significantly in the coming months,” the researchers wrote in their findings. It is therefore planned to carry out a similar investigation again at the end of 2026.
Also interesting:



