Model Release

Claude Fable 5: Benchmarks, Images, Prompting, and API Notes

Claude Fable 5 is the first generally available Mythos-class Claude model. This guide pulls only from official Claude and Anthropic sources: the launch thread, launch post, model docs, pricing docs, fallback docs, prompting guidance, and the Fable/Mythos system card.

Official Anthropic benchmark comparison for Claude Fable 5 and Claude Mythos 5
Official Anthropic launch visual. The tables below use text extracted from the official system card where possible.

Get the latest on AI, LLMs & developer tools

New MCP servers, model updates, and guides like this one — delivered weekly.

What Launched

The official Claude account announced Claude Fable 5 on June 9, 2026 as a Mythos-class model made safe for general use. Anthropic's launch article says Fable 5 exceeds every model the company had previously made generally available, with the lead widening as tasks get longer and more complex.

There are two names to keep straight. Claude Fable 5 is the broadly available model, with safety classifiers. Claude Mythos 5shares the same underlying capabilities but has safeguards lifted in some areas and is limited to approved Project Glasswing and trusted-access customers. When the system card reports both, this article keeps the columns separate.

Official model IDs:
Claude Fable 5  -> claude-fable-5
Claude Mythos 5 -> claude-mythos-5

Context window: 1M tokens
Max output:     128k tokens per request
Pricing:        $10 / MTok input, $50 / MTok output
Batch pricing:  $5 / MTok input, $25 / MTok output
Launch date:    June 9, 2026

Benchmark Snapshot

The official system card is the most useful benchmark source because it separates Fable 5, Mythos 5, Mythos Preview, Opus 4.8, and external model results. Fable's scores reflect production safeguards, including fallback behavior, so small differences between Fable and Mythos do not always mean a capability gap in the underlying model.

EvaluationFable 5Mythos 5Opus 4.8What it measures
SWE-bench Verified95.0%95.5%88.6%500 human-verified software issues, averaged over five trials.
SWE-bench Pro80.0%80.3%69.2%Harder active-repository tasks with larger diffs and less public ground truth.
Terminal-Bench 2.184.3%88.0%82.7%Terminal tasks in a mini-SWE-agent harness; Fable had safety fallback in 20.9% of trials.
OSWorld-Verified85.0%85.0%83.4%Live Ubuntu computer-use tasks, pass@1 averaged over five runs.
GDP.pdf29.8%not listed22.5%Dense professional PDF reasoning; Fable also led GPT-5.5 and Gemini 3.1 Pro in the system card table.
OfficeQA Pro57.9%not listed48.1%Databricks vision-based evaluation over U.S. Treasury Bulletin documents.
Toolathlon61.7% Pass@161.7% Pass@159.9% Pass@1108 real-world tool-use tasks across 32 applications.
MCP Atlas83.3%not listed82.2%Multi-step MCP tool-use workflows over production-like server environments.

The benchmark story is not one giant number. It is a pattern: Fable 5 is strongest where the task is long, tool-heavy, multimodal, ambiguous, or closer to real work than a single prompt-answer exchange. That is why simple smoke tests can undersell it.

Coding Benchmarks

Software engineering is the loudest launch signal. Anthropic reports that Fable 5 reaches 95.0% on SWE-bench Verified and 80.0% on SWE-bench Pro, while the system card places Opus 4.8 at 88.6% and 69.2% respectively. The bigger jump shows up on long-horizon agentic coding benchmarks where a model must investigate, patch, test, and recover over many steps.

BenchmarkFable 5 resultOfficial comparison
FrontierCode DiamondFable 5: 29.3 score / 30.2 pass rateOpus 4.8: 13.4 / 14.5; GPT-5.5: 5.7 / 6.4
FrontierCode MainFable 5: 46.3 score / 48.8 pass rateOpus 4.8: 34.3 / 37.3; GPT-5.5: 25.5 / 28.2
FrontierSWEFable 5 ranked #1 at 2.12 mean@5Opus 4.8 ranked #2 at 3.26; GPT-5.5 ranked #3 at 3.94
CursorBenchFable 5 scored 72.9% at max effortThe system card says it led GPT-5.5 by 8.6 points at that model's highest published effort.

The practical read: do not evaluate Fable 5 only on short snippets, code formatting, or a handful of easy GitHub issues. The official docs say the teams seeing the best outcomes are giving Fable 5 harder, previously unsolved problems. That matches the benchmark pattern: Fable separates most clearly when the work requires persistence.

Long Context and Agentic Search

Fable 5 and Mythos 5 support a 1M token context window by default. The long-context results in the system card are mostly reported for Mythos 5, but they are still useful for understanding what the underlying model class is good at. On GraphWalks, Mythos 5 scored 91.1 F1 on the BFS 256K subset and 79.4 F1 on the BFS 1M subset, ahead of Opus 4.8 at 85.9 and 68.1. On the Parents 1M subset, Mythos 5 scored 97.5 F1 versus Opus 4.8 at 83.3.

On BrowseComp, Anthropic reports that multi-agent Mythos 5 reached 93.3% and that async subagents set the highest score among the tested harnesses. The important developer lesson is not just "use more agents." It is that multi-agent structure helped most on the hard tail: the system card says the largest latency gains came from problems that were already difficult for prior Claude runs.

Vision and Documents

Anthropic calls Fable 5 the new state-of-the-art model for vision tasks. The benchmark details are more grounded than that headline: Fable 5 scored 29.8% on GDP.pdf, a dense professional document benchmark, compared with Opus 4.8 at 22.5%, GPT-5.5 at 24.9%, and Gemini 3.1 Pro at 16.7%. On OfficeQA Pro, the Databricks vision-based evaluation put Fable 5 at 57.9%, ahead of Opus 4.8 at 48.1%.

The system card also reports strong Mythos 5 results on ChartMuseum, LAB-Bench FigQA, and CharXiv Reasoning. For Fable 5 specifically, biology-heavy image tasks can trigger safeguards, so the right conclusion is narrower: Fable 5 is excellent at practical vision/document workflows, but some scientific visual workflows may route through the safeguard path.

Professional Work

The most interesting benchmark category is professional work, because it looks less like a leaderboard and more like what paying users actually do. Anthropic reports Fable/Mythos 5 was preferred over Opus 4.8 in 74% of Real-World Finance v2 pairwise comparisons, with an Elo of 1,374 versus 1,222 for Opus 4.8. Vals AI's Finance Agent v2 evaluation put Fable at 56.31%, above Opus 4.8 at 53.92% and GPT-5.5 at 51.76%.

The legal and tool-use numbers are also useful. On Harvey's Legal Agent Benchmark, the system card reports 16.91% all-pass and 92.0% mean criterion-pass on the full public set in Anthropic's internal harness, plus 13.3% all-pass on Harvey's held-out set. On Toolathlon, Fable 5 scored 61.7% Pass@1 and used 19.8 average turns, while Opus 4.8 scored 59.9% Pass@1 and used 24.5 turns.

There is at least one official counterexample worth keeping: on Vending-Bench, Fable 5's best final balance was $5,680.26, slightly below Opus 4.8's $5,787.43. That is exactly why the system card matters. Fable 5 is not "strictly better on every possible task." It is a much stronger default for hard, long, agentic work, with workload-specific exceptions.

Science Caveat

The launch post and system card describe very strong Mythos 5 life-sciences results: drug-design acceleration, novel molecular-biology hypotheses, genomics research, and benchmark gains on BioMysteryBench, LatchBio Bioinformatics, structural biology, ProteinGym Hard, organic chemistry, protocol troubleshooting, and LABBench2.

For public Fable 5 users, the caveat is central. Fable 5's safeguards are deliberately broad around biology and chemistry, and Anthropic says some beneficial life-sciences tasks may trigger classifiers. If your product is biomedical, computational biology, chemistry, or cyber-adjacent, build the fallback path first and treat raw Fable 5 benchmark expectations carefully.

Official Images and Chart Data

Anthropic shipped several visuals with the launch article. The images below are the official hosted assets that matter most for a benchmark-based article. I am not re-hosting them here; the page references Anthropic's original URLs and links the source section at the end.

Official benchmark comparison table
Official benchmark comparison table: Anthropic's launch-page table comparing Fable 5 and Mythos 5 with other leading models. The typed tables in this article use the system card where possible.
FrontierCode Diamond chart
FrontierCode Diamond chart: Official launch visual for Fable 5 on Cognition's FrontierCode Diamond benchmark.
FrontierCode Main chart
FrontierCode Main chart: Official launch visual for Fable 5 on the FrontierCode Main subset.
Alignment assessment chart
Alignment assessment chart: Anthropic's automated alignment assessment chart for Mythos 5, with Fable 5 expected to be similar because the underlying model is shared.

API, Availability, and Pricing

Claude Fable 5 is generally available on the Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry. Claude Mythos 5 is not generally available; access is limited to approved customers through Project Glasswing and related trusted-access channels.

The official pricing table lists Fable 5 and Mythos 5 at $10 per million input tokens and $50 per million output tokens. Prompt-cache writes are $12.50 per MTok for a 5-minute cache and $20 per MTok for a 1-hour cache, while cache hits and refreshes are $1 per MTok. Batch usage is discounted to $5 input and $25 output per MTok.

Prompting Fable 5

The Fable-specific prompting guide says the model is strongest on problems that were previously too complex, too long-running, or too ambiguous for earlier models. It also warns that prompts and skills written for prior Claude models can be too prescriptive. The migration work is therefore not "add more instructions." It is often "remove old scaffolding and let the stronger model work."

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-fable-5",
    max_tokens=64000,
    output_config={"effort": "high"},
    messages=[
        {
            "role": "user",
            "content": "Analyze this migration plan, implement the safe parts, and verify with tests."
        }
    ],
)

Effort is now the main steering knob. Use high as a default for most hard work, xhigh for capability-sensitive jobs, and medium or low for routine work where latency and cost matter more. On hard tasks, individual turns can run for minutes, and autonomous runs can continue for hours. That means your product needs streaming, async job handling, progress indicators, and timeout settings that match the model you are actually using.

Three prompt changes matter most. First, ground progress claims in actual tool results so long runs do not drift into optimistic status updates. Second, state boundaries: what the model may edit, when it should ask, and what actions are out of scope. Third, stop asking it to reproduce internal reasoning. The docs warn that prompts asking for hidden reasoning can trigger a refusal category; if you need reasoning visibility, use summarized adaptive thinking and a send-to-user tool for progress updates.

Safeguards and Fallback

Fable 5 includes classifiers around cyber, biology and chemistry, distillation, and reasoning extraction. The API-level refusal docs say a refusal is a successful HTTP 200 response with stop_reason: "refusal", not a thrown error. The documented stop_details.category values include cyber, bio, and reasoning_extraction.

The safest production pattern is to configure fallback to Claude Opus 4.8. Server-side fallback is available in beta on the Claude API and Claude Platform on AWS using the server-side-fallback-2026-06-01 beta header; SDK middleware can handle client-side fallback for TypeScript, Python, Go, Java, and C#.

Migration Checklist

1. Change the model ID to claude-fable-5.
2. Set output_config.effort explicitly.
3. Remove old show-your-chain-of-thought instructions.
4. Increase client timeouts and support streaming/async runs.
5. Add progress reporting grounded in tool results.
6. Add explicit scope and permission boundaries.
7. Add memory or notes for long-running tasks.
8. Configure Opus 4.8 fallback and monitor refusal events.
9. Re-run your evals on hard tasks, not only smoke tests.
10. Check the 30-day data-retention requirement before production use.

Fable 5 is a model to evaluate on your hardest workflow, not just your cheapest benchmark. The official benchmark pattern says the advantage grows with long-horizon autonomy, professional deliverables, visual reasoning, tool use, and task ambiguity. That is also where the operational surface grows: cost controls, fallback handling, memory, and observability matter more than they did for short-turn chat.

FAQ

What is Claude Fable 5?

Claude Fable 5 is Anthropic's most capable widely released model, announced on June 9, 2026. It is a Mythos-class model with production safeguards for general use.

What is the Claude Fable 5 API model ID?

The Claude API model ID is claude-fable-5. The restricted sibling model is claude-mythos-5.

Is Claude Fable 5 the same as Claude Mythos 5?

They share the same underlying capabilities, but Claude Fable 5 includes safety classifiers. Claude Mythos 5 has safeguards lifted in some areas and is limited to approved Project Glasswing and trusted-access users.

How much does Claude Fable 5 cost?

Official pricing is $10 per million input tokens and $50 per million output tokens. Batch pricing is $5 per million input tokens and $25 per million output tokens.

What are the biggest Fable 5 benchmark wins?

The strongest official signals are in long-horizon coding, agentic terminal work, document reasoning, computer use, long-context reasoning, and professional workflows. Fable 5 scored 95.0% on SWE-bench Verified, 80.0% on SWE-bench Pro, 72.9% on CursorBench at max effort, and led FrontierCode in both Diamond and Main subsets.

What changes should developers make when prompting Fable 5?

Use effort as the main quality-latency-cost control, expect longer turns on hard tasks, remove old show-your-reasoning instructions, add explicit boundaries, use memory for long-running work, and configure fallback to Opus 4.8 for refused requests.

Official Sources

This article intentionally excludes community posts, press coverage, and unofficial benchmark commentary. All claims above are grounded in these official sources:

Sponsored AI assistant. Recommendations may be paid.