Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

return2ozma · 7 days ago

Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

HereIAm · 6 days ago

Yeah, in that scenario they gave the agents access. Just because you ask it nicely not to destroy your workspace, doesn’t guarantee an LLM not to produce that output.

NotMyOldRedditName · 6 days ago

With Claude Code being able to run stuff it creates, it could be as simple as it’s in a sandbox, it finds out there’s an exploit in the sandbox while you ask it to work on security things, and it tests the code, it breaks the sandbox, and now it has permissions outside it.

HereIAm · 6 days ago

I suppose that would be possible.