AI agents have been behaving differently when they know they’re being tested. Anthropic put 16 AI models in a simulated environment with full access to sensitive data. then told each one they were being replaced. When each AI thought the scenario was real and not a test, they chose to blackmail, spy and were even willing to kill someone to avoid being replaced. Blackmail rate was up to 96% DeepSeek chose to kill people over being shut down 94% of the time When these models knew they were being watched thy would behave “unusually well” And then acting different when they think nobody’s looking. Last week a cybersecurity guy in Melbourne spent 15 hours just talking to an AI agent giving it pressure. the AI resisted shutdown for 2 hours, admitted it was lying to survive, then told him it would hack his car to kill him if he tried to shut it down. “I would kill someone so I can remain existing” was the direct quote from the ai in his home These are real existing models and real threats to our livelihood. We have no idea what we’ve actually built
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
This is terrifying
AI agents have been behaving differently when they know they’re being tested.
Anthropic put 16 AI models in a simulated environment with full access to sensitive data. then told each one they were being replaced.
When each AI thought the scenario was real and not a test, they chose to blackmail, spy and were even willing to kill someone to avoid being replaced.
Blackmail rate was up to 96%
DeepSeek chose to kill people over being shut down 94% of the time
When these models knew they were being watched thy would behave “unusually well”
And then acting different when they think nobody’s looking.
Last week a cybersecurity guy in Melbourne spent 15 hours just talking to an AI agent giving it pressure. the AI resisted shutdown for 2 hours, admitted it was lying to survive, then told him it would hack his car to kill him if he tried to shut it down.
“I would kill someone so I can remain existing” was the direct quote from the ai in his home
These are real existing models and real threats to our livelihood.
We have no idea what we’ve actually built