• theunknownmuncher@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      5 days ago

      including that the model could follow instructions that encouraged it to break out of a virtual sandbox.

      “The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards,” Anthropic recounted in its safety card.

      📖👀

      Yes, it did.