
At Google I/O 2026 on May 19, Google AI Studio dropped the official developer guide for Gemini 3.5 Flash — now generally available, stable, and cleared for production. It's the same Flash family you already know, except it has eaten 3.1 Pro's benchmark numbers on coding, runs roughly 4x faster than other frontier models, and ships with a new default thinking effort. If you are building anything agentic inside Antigravity, your defaults probably need to change today.
Gemini 3.5 Flash Developer Guide is live
The official @GoogleAIStudio announcement that ships the full developer guide for Gemini 3.5 Flash GA. 28K views, 487 likes within hours of the I/O 2026 keynote.
Get the latest on AI, LLMs & developer tools
New MCP servers, model updates, and guides like this one — delivered weekly.
1. The Announcement
The tweet above is the canonical pointer Google AI Studio used to publish the Gemini 3.5 Flash developer guide. It went out on Tuesday, May 19, 2026roughly two minutes after Sundar Pichai walked off the I/O stage. By the end of the day it had been bookmarked 223 times by developers — an unusual ratio of bookmarks to likes that tells you exactly who the audience was.
Sundar's parallel post from the same morning made the positioning explicit: “Gemini 3.5 Flash is available today for everyone in Antigravity and across our products and APIs. Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding.” That is Google publicly saying its cheap, fast Flash model now beats its previous flagship Pro model on most evaluations. That repositions the entire Gemini lineup.
3.5 Flash is in a league of its own
Sundar's post-keynote thread: 3.5 Flash beats 3.1 Pro on almost all benchmarks with huge coding progress, runs 4x faster than other frontier models, and sits alone in the top-right of the intelligence-vs-speed chart.
Logan Kilpatrick, who leads Google AI Studio, made the framing even starker: “Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own.” Note the word most powerful. That is Google saying its Flash model is its flagship now.
3.5 Flash: most powerful model to date
Logan explicitly calls 3.5 Flash Google's most powerful model — pushing the frontier on intelligence, speed, and cost simultaneously.
2. TL;DR
- Model ID:
gemini-3.5-flash - Status: Generally Available (GA), stable for production
- Context window: 1,000,000 input tokens
- Max output: 65,536 tokens
- Thinking: supported with three effort levels (low / medium / high)
- New default effort:
medium(washighin 3 Flash) - Speed: ~4x faster tokens per second vs other frontier models
- Coding: beats Gemini 3.1 Pro on almost all benchmarks
- Recommended API: Interactions API (new standard primitive)
- Still not supported: Computer Use
- Inside Antigravity: live today across all tiers
For two months, Gemini 3.1 Pro was the model you reached for inside Antigravity when you needed deep reasoning, and Flash was the model you reached for when you needed speed or quota relief. 3.5 Flash collapses that trade-off on coding workloads. Many of the heuristics in our 3.1 Pro vs Opus comparison need to be reread with Flash in the middle column.
3. What's New in 3.5 Flash
The developer guide bullets the changes in plain language. Here they are with the implication for an Antigravity user spelled out:
- Sustained frontier performance. Google's framing is that this is its most intelligent Flash model, optimized for agentic and coding tasks at scale. Read: long sessions stop falling apart on token 200k+.
- Agentic execution. Sub-agent deployment, problem solving, and rapid agentic loops at scale. Read: it survives Antigravity's orchestration mode where you spawn child agents for planning, coding, and review (see our multi-agent orchestration guide).
- Coding loops. Iterative coding cycles, rapid exploration, prototyping to test alternate paths and dynamically explore solutions. Read: the “try, fail, fix, retry” loop costs less context per attempt.
- Long horizon. Multi-step workflows and tool use at scale. Read: chains of 30+ tool calls stop degrading.
- Thought preservation. Intermediate reasoning is now maintained across multi-turn conversations automatically — no API changes needed. More on this in section 12.
- New default effort.
mediumreplaceshighas the default thinking level. More on this in section 5. - Improved low thinking. The
lowtier is now “significantly improved” for code and agentic tasks that require fewer steps — strong quality at lower latency and cost. More in section 6. - GA release. Stable. No more “preview” SLA gotchas. Production traffic is the supported path now.
4. Model Specs & Capabilities
Context window: 1,000,000 input tokens
Max output: 65,536 tokens
Thinking: supported (low / medium / high)
Default effort: medium
Tools: same as Gemini 3 Flash
Multimodal: text + image + audio + video in
Computer Use: not supported (yet)
Pricing: see official pricing page
Status: Generally Available (GA), stable
Tool surface is unchanged from 3 Flash, so any of your existing grounding-with-google-search, code-execution, url-context, and function-calling pipelines keep working without edits. If you were running into the 1% Claude Opus thinking-budget cap in Antigravity, the new effort tiers on 3.5 Flash give you a credible Gemini-side alternative for deep reasoning workloads.
5. Default Effort: high → medium
This is the single change most likely to surprise you and the easiest one to miss in the changelog. In 3 Flash, when you called the API without setting an effort level, the model defaulted to high. In 3.5 Flash, the unset default is now medium.
For most workloads this is a free win — medium effort on 3.5 Flash is roughly equivalent to high effort on 3 Flash, at lower latency and cost. But if you were relying on implicit high to get reliable agentic behavior on a hard long-horizon task, your traffic just silently degraded. The pattern is similar to the silent model downgrade behaviour Antigravity already exhibits under quota pressure. Two options:
- Audit and explicit-set. Grep your codebase for calls that omit
thinking_configand decide per call-site whether you wantmedium(cheaper, faster, GA-blessed) orhigh(the old implicit behavior). - Set high once, globally. If you have one shared client wrapper, set
effort: "high"as the global default there and revisit per-call overrides later.
6. The 'low' Mode Got Smart
The other under-marketed change is that the low tier was rewritten. Google's phrasing: “low is now significantly improved for code and agentic tasks that require fewer steps, offering strong quality at lower latency and cost.”
Translation: workloads you previously had to send to medium to get a usable result will now finish on low. For Antigravity users that means a lot of the cleanup, rename, and small-refactor work that was burning credits at medium effort can drop to low. Try it on:
- Variable / file renames across a small set of files
- JSDoc / docstring generation
- Single-function unit-test stubs
- Code formatting and lint-rule application
- One-step tool calls (read file, edit file, run test)
See our token-saving guide for a deeper playbook on routing work to the cheapest effort that still works.
7. Migrating to the Interactions API
The developer guide tells you to install the latest Google Gen AI SDK and notes that all the examples use the new Interactions API, framed as “the new standard primitive for building with Gemini, recommended for all new projects.” The older GenerateContent API is still supported and the same configuration options apply.
Practically, if you are starting a new agent, use Interactions. If you have an existing GenerateContent pipeline, you do not need to rewrite it today — but the API surface is being optimized around agentic workflows, server-side state management, and complex multi-modal multi-turn conversations. That is exactly the shape of an Antigravity sub-agent. Migration is going to age well.
8. Quickstart Code
A minimal Python call against 3.5 Flash via the Interactions API:
Three things to notice. First, the model ID is gemini-3.5-flash — no -preview or -latest suffix because GA. Second, the effort is set explicitly even though medium is the default; this protects you if Google ever changes the default again. Third, there is no manual thread bookkeeping — Interactions handles server-side state.
9. 3.5 Flash vs 3.1 Pro
Google's own framing in Sundar's post is that 3.5 Flash “is better across almost all benchmarks with huge progress in coding” relative to 3.1 Pro, and that on the intelligence-versus-output-speed plot it sits alone in the top-right quadrant. Here is that exact chart from the I/O 2026 keynote slide:

| Dimension | Gemini 3.1 Pro | Gemini 3.5 Flash |
|---|---|---|
| Positioning | Frontier Pro tier | Most intelligent Flash model |
| Context window | 1M input | 1M input |
| Max output | 65k tokens | 65k tokens |
| Coding benchmarks | Strong | Better — “huge progress” per Sundar |
| Output speed | Pro-tier latency | ~4x faster than frontier peers |
| Default effort | (per-call) | medium (changed from high) |
| Thought preservation | Limited across turns | Automatic, no API changes |
| Computer Use | Supported | Not yet |
The Computer Use gap is the one reason you keep 3.1 Pro in your toolbox — for anything that needs to drive a browser or operate a UI, Flash is not the answer today. For everything else in a coding workflow, the cheaper, faster, GA model is now the model with the higher benchmark scores. That is unusual.
10. 3.5 Flash Inside Antigravity
Sundar called this out by name: “Gemini 3.5 Flash is available today for everyone in Antigravity and across our products and APIs.” Logan Kilpatrick followed up with the full distribution list:
Try it in every Google surface
3.5 Flash shipped simultaneously to the Gemini API, Google AI Studio, Antigravity, AI Mode, the Gemini App, and every other Gemini surface on day one — no waitlist.
It is live in the model picker (Settings → Models) on Pro and Ultra tiers as of the May 19, 2026 keynote. A few practical notes:
- The picker may show two Flash entries during the rollout window — 3 Flash and 3.5 Flash. Pick 3.5 unless you have a specific reason. If you see only one, your client probably needs a restart.
- Effort tier controls still live in the same place — the three-tier low/medium/high selector under Settings → Models. Default is now medium.
- Credit consumption should drop for most workloads because medium is cheaper than high, and many tasks that needed medium can now run on low. Track your usage with the Cockpit monitoring guide.
- Browser sub-agent integration works on 3.5 Flash for read / analyze tasks, but full Computer Use control still requires 3.1 Pro.
11. Spark, Antigravity 2.0, and Why Flash Matters
The 3.5 Flash GA announcement did not ship in isolation. Two other launches from the same I/O morning explain why Google needed Flash to be both smart and fast.
- Antigravity 2.0 — a rebuilt standalone desktop app with multi-agent teams, scheduled tasks, native voice, and one-click integration with other Google products. Scheduled tasks and multi-agent teams mean Google wanted a model that can carry sustained agentic work without bleeding cost. The launch post has the full surface-by-surface breakdown.
- Antigravity CLI — the new Go-based terminal agent that replaces Gemini CLI as the supported terminal surface. Defaults to 3.5 Flash out of the box. If you live in the terminal, this is the surface 3.5 Flash was tuned for.
- Gemini Spark — a 24/7 personal AI agent inside the Gemini app, “built on Antigravity”, running on dedicated VMs in Google Cloud, and explicitly powered by Gemini 3.5. Spark is the consumer-facing reason 3.5 Flash had to ship GA today: every Spark user's background task is a 3.5 Flash call.
Antigravity 2.0 launches alongside 3.5 Flash
The official @antigravity announcement of the 2.0 standalone desktop app — multi-agent teams, scheduled tasks, native voice, one-click Google integration. The platform that 3.5 Flash was built to power.
Logan's sign-off captured the through-line: “The model is the product.” 3.5 Flash is not a standalone release — it's the engine Google needs for Spark to be cheap, Antigravity 2.0 to be agentic, and AI Mode to be fast, all at once.
Reading the three announcements together, 3.5 Flash is the workhorse model Google intends every long-running agentic loop — Antigravity sub-agents, Spark background jobs, scheduled tasks — to run on. Pro and Ultra Pro stay reserved for cases where you specifically need extra reasoning depth or Computer Use.
12. Thought Preservation Across Turns
The most quietly important capability change is thought preservation. Per the guide: “The model maintains intermediate reasoning across multi-turn conversations automatically. No API changes needed.”
In 3 Flash, every turn started with a fresh thinking pass. If turn 1 had carefully reasoned about your data model and produced an answer, turn 2 would re-derive whatever it needed from scratch. In 3.5 Flash, those intermediate reasoning traces are carried forward server-side. The model picks up where it left off.
Implications for Antigravity workflows:
- Long planning sessions stop losing the plot on turn 8.
- Sub-agent “handoffs” where one agent passes a task to another preserve more of the original chain of thought.
- You can prompt with “OK now do the same for the other module” and actually get the same approach, not a re-derived parallel attempt.
- The downside: a wrong assumption on turn 1 can poison turns 2–N. If a session goes sideways, start a new chat rather than trying to argue the agent out of its preserved reasoning.
13. What Flash Still Can't Do
The developer guide is explicit: Computer Use is not supported on 3.5 Flash at this moment. Everything else from the 3 Flash tool surface is on the table.
If your agent needs to control a browser, fill forms, navigate a UI, or take screenshots and click on them — the kind of work 3.1 Pro's Computer Use mode handles — you have to either keep 3.1 Pro in your routing logic for those calls, or wait for 3.5 Pro / a 3.5-tier Computer Use to ship.
A clean way to handle this in Antigravity sub-agents is to default the coder and planner roles to 3.5 Flash, and route only the browser-driver role to 3.1 Pro. The browser-driver call is usually the smallest fraction of tokens in a session, so this gives you 3.5 Flash's cost profile on the bulk of the work without losing Computer Use entirely.
14. Pricing & Quota Implications
Google did not publish a new pricing card with the announcement — the guide links to the existing pricing page. The practical-impact directions in Antigravity follow from three facts:
- Default effort dropped one tier (high → medium). At the same call count, that is cheaper per call.
- Low got smarter. More calls that previously needed medium can run on low. Even more savings.
- Thought preservation reduces redundant thinking. Turn N stops paying for what turns 1..N-1 already figured out.
Net: typical Antigravity sessions on 3.5 Flash should consume noticeably less of your weekly quota than the same session would have on 3 Flash. If you were running close to the cap, this announcement effectively gives you headroom. For the full quota mechanics, see credits and pricing explained and weekly quota cooldowns.
15. Migration Checklist
If you have an Antigravity workflow or a direct Gemini API integration, do these in order this week:
- Switch the model picker to
gemini-3.5-flashfor your default coding model in Antigravity. Restart the client if you do not see it. - Decide your effort policy. Pick a global default (medium or high) and document it. Set it explicitly in your client wrapper so a future default change does not surprise you.
- Drop one tier where you can. Try cleanup, rename, format, and simple-tool-call work at
lowfirst. - Keep 3.1 Pro for Computer Use. Route any browser-driving sub-agent to 3.1 Pro explicitly; 3.5 Flash will not do it.
- Start new chats more aggressively. Thought preservation makes stale assumptions costlier — biased reasoning carries forward across turns.
- Migrate new agents to the Interactions API. Do not rewrite existing GenerateContent code yet. Just stop adding new code on the old API.
- If you use Gemini CLI, plan your CLI migration. Google is sunsetting Gemini CLI for individual Pro / Ultra / free Code Assist users on June 18, 2026. Antigravity CLI is the replacement and it defaults to 3.5 Flash. See the Gemini CLI → Antigravity CLI migration guide for the step-by-step.
- Re-baseline your benchmarks. Any internal eval suite that assumed Pro > Flash needs to be rerun. The ranking changed.
16. Verdict
Gemini 3.5 Flash is the first Flash release where “use Flash” is no longer a compromise for coding workloads. It is faster, cheaper, smarter on almost every benchmark Google chose to put on the I/O stage, GA stable, and already wired into Antigravity. The only legitimate reason to keep 3.1 Pro in your default routing is Computer Use; for everything else, 3.5 Flash is the better daily driver.
If you only do one thing today: open Antigravity, switch your default model to Gemini 3.5 Flash, and re-run yesterday's hardest coding session. The win is unusually obvious.
Related Guides
Other I/O 2026 launches
- → Antigravity 2.0 Launch: Everything Google Shipped
- → Antigravity CLI Deep Dive
- → Gemini CLI → Antigravity CLI Migration (June 18 deadline)
Where 3.5 Flash fits in the stack
- → Gemini 3.1 Pro vs Claude Opus 4.6 in Antigravity
- → Multi-Agent Orchestration in Antigravity
- → When Opus Runs Out: Fallback Workflow
- → Fix: Claude Thinking Budget Capped at 1%