Claude Opus 4.8 API Migration and Prompting Guide

Get the latest on AI, LLMs & developer tools

New MCP servers, model updates, and guides like this one — delivered weekly.

Migration Summary

The official Claude migration guide says code already running on Claude Opus 4.7 should continue to work on Opus 4.8 without breaking API changes. That does not mean you should blindly change the model ID in production. Opus 4.8 recalibrates effort, defaults to high effort everywhere, lowers the prompt-cache minimum, and adds mid-conversation system messages.

Minimal migration:
1. Replace claude-opus-4-7 with claude-opus-4-8.
2. Keep adaptive thinking, not manual budget_tokens.
3. Keep sampling params omitted.
4. Re-baseline effort, latency, and cost.
5. Run your eval suite before production rollout.

Model ID

Use claude-opus-4-8 on the Claude API. The models overview lists Opus 4.8 as the top choice for complex reasoning, long-horizon agentic coding, and high-autonomy work. It also documents the 1M context window on the Claude API, Bedrock, and Vertex AI, with 200k on Microsoft Foundry.

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[
        {"role": "user", "content": "Review this architecture migration plan."}
    ],
)

Inherited Constraints

Opus 4.8 inherits two constraints from Opus 4.7. First, non-default sampling parameters are not supported. If you send temperature, top_p, ortop_k with non-default values, the API returns a 400 error. Second, manual extended thinking budgets are not supported. Do not send thinking: {type: "enabled", budget_tokens: N}.

If you are migrating from Opus 4.6 or older, apply the Opus 4.7 migration first. That is where the breaking changes live: sampling removal, manual thinking removal, tokenizer changes, and prefill migration.

Effort and Thinking

Opus 4.8 uses adaptive thinking. Thinking is off unless you explicitly set thinking: {type: "adaptive"}. Once enabled, effort becomes the main control for thinking depth and tool-use appetite.

Effort	Use it for	Risk
`low`	Short, scoped, latency-sensitive tasks	Can under-think moderately complex work
`medium`	Balanced cost-sensitive agent tasks	Needs evals before broad rollout
`high`	Default for quality-sensitive tasks	Higher token usage than lower levels
`xhigh`	Coding and long-horizon agentic work	Meaningfully higher token usage
`max`	Frontier problems with eval-backed gains	Can overthink and spend too much

Anthropic's practical recommendation is conservative: start with xhighfor coding and agentic use cases, use high for most other intelligence-sensitive workloads, and step down only when measurements show quality holds.

Prompting 4.8

Opus 4.8 is more literal than older Opus models, especially at lower effort levels. If an instruction applies to every item, say that. If the model should implement rather than suggest, say that. If a tool must be used, explain when and why. The prompting guide is clear that vague prompts are less reliable than explicit scope, output format, and examples.

Weak:
Review these files and be conservative.

Better:
Find every issue that could cause incorrect behavior, a test failure,
or a misleading result. Include low-confidence findings. Do not filter
for severity in this pass; a later verification step will rank them.

For frontend work, the official prompting guide calls out a persistent default style: warm off-white backgrounds, serif display type, and amber or terracotta accents. If that style does not fit the product, specify a concrete visual direction before generating UI.

System Messages

Opus 4.8 accepts role: "system" entries immediately after a user turn in the messages array, subject to placement rules. The use case is agentic loops where permissions, token budgets, environment state, or instructions change mid-task. You can append a new instruction without rebuilding the whole prompt and breaking prompt-cache hits.

Keep the top-level system field for instructions that apply from the start. Use mid-conversation system messages for updates that happen after the user has already started a long task.

Prompt Cache

Opus 4.8 lowers the minimum cacheable prompt length to 1,024 tokens. This is a quiet but useful platform change for agents with medium-sized system prompts or harness state. Prompts that were too short to cache on Opus 4.7 can now create cache entries without code changes.

Fast and standard speed do not share cached prefixes in Claude Code docs. If you switch into fast mode mid-conversation, expect a cache miss and higher uncached input cost.

Fast Mode API

The Opus 4.8 whats-new page says fast mode is available for Claude Opus 4.8 as a research preview on the Claude API using speed: "fast". The launch post lists Opus 4.8 fast mode at 2.5x output speed and $10 input / $50 output per million tokens.

Treat fast mode as a latency product, not an intelligence product. Claude's docs describe it as the same model with faster inference configuration. Use it when response time is the bottleneck and the premium is justified.

Code Review Harnesses

Opus 4.8 is better at finding bugs, but prompt wording can make a harness appear to lose recall. The prompting guide explains why: if your review prompt says "only report high severity issues" or "be conservative," a more literal model may find issues and then filter them out. For first-pass review, ask for coverage first and ranking second.

Recommended first-pass review prompt:
Report every issue you find, including uncertain or low-severity issues.
Do not filter for importance or confidence at this stage.
For each finding, include confidence and estimated severity.
A downstream step will verify, deduplicate, and rank findings.

Checklist

Replace claude-opus-4-7 with claude-opus-4-8.
Remove non-default temperature, top_p, and top_k.
Use thinking: {type: "adaptive"}, not manual budget_tokens.
Set output_config.effort explicitly for production workloads.
Start coding agents at xhigh and compare against high.
Remove old 1M-context beta headers where Opus 4.8 makes them unnecessary.
Test mid-conversation system messages in long-running harnesses.
Re-baseline prompt caching because the minimum is now 1,024 tokens.
Re-run evals for code review recall, tool use, verbosity, and latency.
Use /claude-api migrate in Claude Code when you want the bundled skill to inspect a codebase.

FAQ

Is migrating from Opus 4.7 to Opus 4.8 breaking?

No, not if your code already runs cleanly on Opus 4.7. Claude's migration guide says there are no breaking API changes for code already running on Claude Opus 4.7.

Do temperature, top_p, or top_k work on Opus 4.8?

No. Non-default sampling parameters return a 400 error on Opus 4.8, the same as Opus 4.7. Omit them and use prompting plus effort to steer behavior.

Does Opus 4.8 support manual thinking budgets?

No. Opus 4.8 does not support thinking: {type: 'enabled', budget_tokens: N}. Use adaptive thinking and the effort parameter instead.

What effort should I use for coding?

Anthropic recommends xhigh for coding and agentic use cases, high for most other intelligence-sensitive workloads, and lower levels only after measuring quality on your own evals.

What changed for prompt caching?

The minimum cacheable prompt length on Opus 4.8 is 1,024 tokens, lower than Opus 4.7. Some prompts that were too short to cache on 4.7 can now cache without code changes.