Track
Update: Access to Claude Fable 5 and Claude Mythos 5 has been temporarily suspended following a US government export control directive. Anthropic is working to restore access as quickly as possible. In the meantime, all other Claude models remain available. We'll update this article accordingly.
Anthropic launched two models on June 9, 2026: Claude Fable 5, the public-facing Mythos-class model with conservative safety guardrails, and Claude Mythos 5, the same underlying model with those guardrails lifted for a select group of trusted partners. This article is about Mythos 5, the version Anthropic describes as having "the strongest cybersecurity capabilities of any model in the world."
In this article, I'll cover what Claude Mythos 5 is, what it can do across software engineering, life sciences, and scientific research, how it performs on benchmarks, and who can access it. You can also check out our coverage of Claude Opus 4.8 for context on where Mythos 5 sits relative to Anthropic's broader model family.
Stay up to date with the latest in all things AI. Subscribe to The Median, our free Friday newsletter that breaks down the week's key stories. Stay sharp in just a few minutes a week.
What Is Claude Mythos 5?
Claude Mythos 5 is Anthropic's highest-capability model, sitting above the Opus class in what Anthropic calls the Mythos tier. The first Mythos-class model, Claude Mythos Preview, was released in April 2026 through Project Glasswing, a collaboration with the US Government focused on cybersecurity. Mythos 5 is the second release in this tier and a direct upgrade to Mythos Preview.
Mythos 5 and Fable 5 share the same underlying architecture. The difference is the safeguards: Fable 5 ships with classifiers that route sensitive cybersecurity and biology queries to Claude Opus 4.8 instead. Mythos 5 has those classifiers lifted in specific areas for partners who have been vetted through the trusted access program. Anthropic is explicit that the name difference reflects the safeguard difference, not a capability difference.
The headline benchmark claim is that Mythos 5 scores 80.3% on SWE-bench Pro, compared to 77.8% for Mythos Preview and 69.2% for Opus 4.8. On Humanity's Last Exam with tools, it scores 64.5%, ahead of Opus 4.8's 57.9% and GPT-5.5's 52.2%. These are not marginal improvements over the Opus class.
Introduction to Claude Models
What's New With Claude Mythos 5?
Mythos 5 represents a step up from Mythos Preview across every major capability area Anthropic has tested. The gains are most visible in long-horizon autonomous work, especially in scientific domains of scientific reasoning and vision tasks. Here's what that looks like in practice.
Secure autonomous software engineering at scale
Mythos 5 can work autonomously on large codebases for longer than any previous Claude model. Stripe reported the model compressed months of engineering work into days, completing a codebase-wide migration across a 50-million-line Ruby codebase in a single day. On FrontierCode (Diamond), it scores highest among frontier models even at medium effort.
For security work, Mythos 5 extends the capabilities that made Mythos Preview valuable to Project Glasswing partners. Those partners used Mythos Preview to identify over 10,000 high and critical security flaws across production systems.
Drug design and protein engineering
Anthropic's internal protein design team used Mythos 5 to accelerate drug design by roughly ten times. In a controlled comparison, Mythos 5 matched or beat skilled human operators across 14 protein targets for the full pipeline:
- choosing binding sites
- selecting tools
- recovering from failures
Nine yielded strong drug design candidates currently under investigation.

Novel scientific hypothesis generation
Mythos 5 is Anthropic's first model to consistently produce novel scientific hypotheses rather than summarize existing literature. In blinded comparisons, Anthropic's scientists preferred its molecular biology hypotheses roughly 80% of the time, and several have been advanced to experimental evaluation. One hypothesis about a novel E. coli protein mechanism was independently corroborated by a lab working on the same problem.
Autonomous genomics research
Mythos 5 conducted novel genomics research over more than a week of largely autonomous work, assembling single-cell data for millions of cells across 138 animal species and training a custom ML model to identify equivalent cell types across distantly related organisms. The trained model outperformed a recent Science-published model despite being 100 times smaller.
Vision and long-context performance
Mythos 5 scores 93.2% on CharXiv Reasoning with tools and can extract precise numbers from detailed scientific figures or rebuild a web app from screenshots alone. On long-context tasks, giving Mythos 5 file-based memory improved its performance three times more than the same setup improved Opus 4.8, and it reached the final act of Slay the Spire three times more often.
Claude Mythos 5 Benchmarks
Mythos 5 leads or ties on nearly every benchmark Anthropic tested, with gains over Opus 4.8 that are consistent across categories rather than concentrated in one area. The comparison table pits it against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.
| Category | Benchmark | Claude Mythos 5 / Fable 5 | Claude Mythos Preview | Claude Opus 4.8 | GPT 5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|---|---|
| Agentic coding | SWE-Bench Pro | 80.3% | 77.8% | 69.2% | 58.6% | 54.2% |
| Agentic coding | FrontierCode (Diamond) | 29.3% (xhigh) | — | 13.4% (xhigh) | 5.7% (xhigh) | — |
| Knowledge work | GDPval-AA | 1932 | — | 1890 | 1769 | 1314 |
| Knowledge work vision | GDP.pdf | 29.8% (no tools) | — | 22.5% (no tools) | 24.9% (no tools) | 16.7% (no tools) |
| Spatial reasoning | Blueprint-Bench 2 | 38.6% | — | 14.5% | 36.2% | 26.5% |
| Tool use | AutomationBench | 17.4% | — | 15.5% | 12.9% | 9.6% |
| Computer use | OSWorld-Verified | 85.0% | 85.4% | 83.4% | 78.7% | 76.2% |
| Legal | Legal Agent Benchmark | 13.3% | — | 10.4% | 2.1% | 0.0% |
| Multidisciplinary reasoning | Humanity's Last Exam (no tools) | 59.0%* | 56.8% | 49.8% | 41.4% | 44.4% |
| Multidisciplinary reasoning | Humanity's Last Exam (with tools) | 64.5%* | 64.7% | 57.9% | 52.2% | 51.4% |
| Biology | BioMysteryBench (hard) | 46.1%* | 29.6% | 40.0% | — | — |
| Biology | BioMysteryBench (human solved) | 83.9%* | 82.6% | 80.4% | — | — |
| Agentic coding | Terminal-Bench 2.1 | 88.0%* | — | 82.7% | 83.4% (Codex CLI) | 70.7% (Gemini CLI) |
| Cybersecurity | ExploitBench (Cap%) | 78.0%* | 69.0% | 40.0% | 34.0% | — |
| Health | HealthBench Professional | 66.0%* | 64.7% | 56.9% | 51.8% | — |
Anthropic reports Mythos 5 and Fable 5 scores together, noting they fall within 1–3 percentage points in most cases. Benchmarks marked with an asterisk (*) show a larger gap because Fable 5's safety classifiers route sensitive queries to Opus 4.8; on those benchmarks, Fable 5 performs closer to the Opus class.
Agentic coding: SWE-Bench Pro, FrontierCode, and Terminal-Bench 2.1
On SWE-Bench Pro, Mythos 5 scores 80.3%, compared to 77.8% for Mythos Preview, 69.2% for Opus 4.8, 58.6% for GPT 5.5, and 54.2% for Gemini 3.1 Pro. The 11-point gap over Opus 4.8 is substantial on a benchmark designed to resist ground-truth leakage.
For high-quality and maintainable agentic code rather than raw task completion, measured by FrontierCode (Diamond), the separation is even sharper. Mythos 5 scores 29.3% at the xhigh effort level, compared to 13.4% for Opus 4.8 and 5.7% for GPT 5.5.

For terminal work, Mythos 5 takes the crown back to Anthropic from OpenAI: Mythos 5 scores 88.0%* on Terminal-Bench 2.1, compared to 82.7% for Opus 4.8, 83.4% for GPT 5.5 (Codex CLI), and 70.7% for Gemini 3.1 Pro (with Gemini CLI).
Knowledge work: GDPval-AA and GDPpdf
GDPval-AA measures knowledge work performance on a numerical scale. Mythos 5 scores 1932, compared to 1890 for Opus 4.8, 1769 for GPT 5.5, and 1314 for Gemini 3.1 Pro.
This gap is elevated for knowledge work on PDF documents without tool access. Mythos 5 scores 29.8% on GDPpdf, compared to 22.5% for Opus 4.8, 24.9% for GPT 5.5, and 16.7% for Gemini 3.1 Pro.
Multidisciplinary reasoning: Humanity's Last Exam
Humanity's Last Exam (HLE) tests graduate-level reasoning across science, mathematics, and humanities. Mythos 5 scores 59.0%* without tools and 64.5%* with tools. Mythos Preview scores 56.8% and 64.7% respectively—essentially tied with tools but trailing by 2 points without. The distance to Opus 4.8 is already significant (49.8% without, 57.9% with), but even bigger to the flagship competitor models (GPT 5.5: 41.4% and 52.2%, Gemini 3.1 Pro: 44.4% and 51.4%).
The gap between Mythos 5 and the rest of the field is clearest in the no-tools condition, where it leads Opus 4.8 by over 9 points. These are starred scores, meaning Fable 5 performs somewhat lower due to its safety classifiers.
Computer use, tool use, and spatial reasoning
On OSWorld-Verified, which tests the model's ability to complete tasks on a real computer interface, Mythos 5 scores 85.0%. Mythos Preview edges ahead at 85.4%, making this the only benchmark where Mythos Preview leads. Opus 4.8 comes quite close (83.4%), with the competitors falling behind a bit. GPT 5.5 scores 78.7%, and Gemini 3.1 Pro scores 76.2%.
AutomationBench measures tool use capabilities. Mythos 5 scores 17.4%, compared to 15.5% for Opus 4.8, 12.9% for GPT 5.5, and 9.6% for Gemini 3.1 Pro. The low absolute numbers across the board suggest tool use remains a hard problem for all frontier models.
Spatial reasoning is one area where Mythos 5's lead is the biggest. It scores 38.6% in Blueprint-Bench 2, more than double that of Opus 4.8's 14.5%. GPT 5.5 is closer at 36.2%, and Gemini 3.1 Pro scores 26.5%.
Cybersecurity and biology
Those were the two areas that arguably received the most attention in the release notes, and the results show us why.
ExploitBench measures the fraction of exploits the model can successfully reproduce (Cap%). Mythos 5 scores 78.0%*, which is even a significant improvement from Mythos Preview (69.0%), and a dramatic increase compared to Opus 4.8's 40.0% for Opus 4.8 and 34.0% for GPT 5.5.
The 38-point gap over Opus 4.8 is the largest single-benchmark lead in the comparison table, and it explains why the cyber safeguards exist for Fable 5. Anthropic's external red-teaming found no universal jailbreaks on long-form agentic tasks, though the UK AISI made progress toward one in an initial testing window.
BioMysteryBench tests biological reasoning at two difficulty levels. On the hard subset, Mythos 5 scores 46.1%*, compared to 29.6% for Mythos Preview and 40.0% for Opus 4.8. On the human-solved subset, Mythos 5 scores 83.9%*, Mythos Preview scores 82.6%, and Opus 4.8 scores 80.4%. GPT 5.5 and Gemini 3.1 Pro do not have reported scores on either subset.
As with ExploitBench, Fable 5's scores are closer to Opus 4.8 due to biology-related safety classifiers.
Health and legal
Claude Mythos 5 demonstrates notable strength in two high-stakes professional domains where accuracy and reasoning quality carry real-world consequences: medicine and law.
In HealthBench Professional, Mythos 5 scores 66.0%*, just over Mythos Preview's 64.7%. Opus 4.8 scores 56.9%, and GPT 5.5 scores 51.8%.
On the Legal Agent Benchmark, Mythos 5 scores 13.3%, compared to 10.4% for Opus 4.8, and only 2.1% for GPT 5.5. The absolute scores are low, but the separation between Mythos 5 and GPT 5.5 or Gemini is stark. Legal reasoning remains a challenging frontier for all models.
Claude Mythos 5 Pricing and Availability
Claude Mythos 5 is priced at $10 per million input tokens and $50 per million output tokens. This is less than half the price of Claude Mythos Preview ($25/$125), which makes the upgrade straightforward for existing Glasswing partners. Developers can access the model via the Claude API using the model ID claude-mythos-5.
Access is currently restricted to two groups:
- All users who had access to Claude Mythos Preview through Project Glasswing can upgrade to Mythos 5 with cyber safeguards lifted
- A small group of biomedical researchers that can access Mythos 5 with biology and chemistry safeguards lifted, but cyber safeguards still in place.
Anthropic plans to expand both programs over time.
A broader trusted access program is planned for cybersecurity organizations to apply more systematically, in consultation with the US Government. Anthropic has not announced a timeline for general availability. For most developers, Claude Fable 5 is the practical option today, with the same underlying model and access via standard subscription and API plans.
One operational detail worth flagging: Anthropic has introduced a 30-day data retention policy for all Mythos-class model traffic. The data is not used for training and is deleted after 30 days in almost all cases, but it is retained for safety monitoring. If you're building on Mythos 5 with sensitive data, review Anthropic's support documentation on this policy before deploying.
Final Thoughts
Claude Mythos 5 is Anthropic's clearest statement yet that the company is serious about deploying frontier AI in high-stakes professional contexts, and the results back it up.
The SWE-bench Pro gap (80.3% vs 69.2%), the Terminal-Bench 2.1 gap (88.0% vs 82.7%), and the ExploitBench gap (78.0% vs 40.0%) all point to a model that handles the hardest tasks more reliably than anything else available.
The restricted access model is a reasonable approach given the dual-use risks, and the ExploitBench scores make a compelling case that the most capable offensive security tools shouldn't be publicly available. The harder question is whether Anthropic can expand the trusted access program fast enough to be useful to the broader security and biomedical research communities before competitors close the gap.
For organizations that qualify, the upgrade from Mythos Preview is straightforward at less than half the price.
Claude Mythos 5 FAQs
What is the difference between Claude Mythos 5 and Claude Fable 5?
Mythos 5 and Fable 5 share the same underlying architecture, but differ in their safety guardrails. Fable 5 routes sensitive cybersecurity and biology queries to Claude Opus 4.8 via classifiers, while Mythos 5 has those classifiers lifted for vetted partners. The name difference reflects the safeguard difference, not a capability difference.
Who can access Claude Mythos 5?
Access is currently restricted to two groups: Project Glasswing cybersecurity partners, who can use Mythos 5 with cyber safeguards lifted, and a small group of vetted biomedical researchers, who can access it with biology and chemistry safeguards lifted but cyber safeguards still in place. Anthropic plans to expand both programs over time, with a broader trusted access program for cybersecurity organizations in consultation with the US Government.
How does Claude Mythos 5 compare to GPT-5.5 and Gemini 3.1 Pro?
Mythos 5 leads on every benchmark tested against the two competitors. The gaps are largest on ExploitBench (78.0% vs 34.0% for GPT-5.5), FrontierCode Diamond (29.3% vs 5.7%), and Humanity's Last Exam with tools (64.5% vs 52.2%). Gemini 3.1 Pro trails further behind on most benchmarks.
Is Claude Mythos 5 safe to use with sensitive data?
Anthropic has introduced a 30-day data retention policy for all Mythos-class model traffic. The data is not used for training and is deleted after 30 days in almost all cases, but it is retained for safety monitoring purposes. Organizations handling sensitive data should review Anthropic's support documentation on this policy before deploying.
What does the 30-day data retention policy mean for Mythos 5 users?
Unlike standard API usage, where data is not retained, Mythos-class traffic is held for up to 30 days for safety monitoring before deletion. This applies to all Mythos 5 API calls and is not used for model training. It is a meaningful operational consideration for any organization deploying the model in a production context with confidential or regulated data.
Tom is a data scientist and technical educator. He writes and manages DataCamp's data science tutorials and blog posts. Previously, Tom worked in data science at Deutsche Telekom.

