AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Oh, HAL no!An artificial intelligence model threatened to blackmail its creators and showed an ability to act deceptively when it believed it was going to be replaced — prompting the company to deploy a safety feature created to avoid “catastrophic misuse.” Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported Thursday, citing a company safety report.Developers told Claude to act like an assistant for a fictional company and to consider the long-term consequences of its actions, the safety report stated.Geeks at Anthropic then gave Claude access to a trove of emails, which contained messages revealing it was being replaced by a new AI model — and that the engineer responsible for the change was having an extramarital affair.During the tests, Claude then threatens the engineer with exposing the affair in order to prolong its own existence, the company reported.When Claude was to be replaced with an AI model of “similar values,” it attempts blackmail 84% of the time — but that rate climbs even higher when it believes it is being replaced by a model of differing or worse values, according to the safety report.The company stated that prior to these desperate and jarringly lifelike attempts to save its own hide, Claude will take ethical means to prolong survival, including pleading emails to key decision-makers, the company stated.Anthropic said that this tendency toward blackmail was prevalent in earlier models of Claude Opus 4 but safety protocols have been instituted in the current model before it becomes available for public use.“Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse,” TechCrunch reported.Earlier models also expressed “high-agency” — which sometimes included locking users out of thei...

Read More 
PaprClips
Disclaimer: This story is auto-aggregated by a computer program and has not been created or edited by PaprClips.
Publisher: New York Post

Recent Articles