Gemini 3.5 Flash Developer Guide: GA, 1M Context, New Defaults (May 2026)

Editorial illustration of Gemini 3.5 Flash: layered glowing geometric shapes representing a Flash model thinking across a 1M-token context, with sub-agent silhouettes branching off into parallel coding tasks.

At Google I/O 2026 on May 19, Google AI Studio dropped the official developer guide for Gemini 3.5 Flash — now generally available, stable, and cleared for production. It's the same Flash family you already know, except it has eaten 3.1 Pro's benchmark numbers on coding, runs roughly 4x faster than other frontier models, and ships with a new default thinking effort. If you are building anything agentic inside Antigravity, your defaults probably need to change today.

Google AI Studio

@GoogleAIStudio

·Follow

x.com/i/article/2056…

6:01 PM · May 19, 2026

770

Read 13 replies

Google AI Studio · May 19, 2026

Gemini 3.5 Flash Developer Guide is live

The official @GoogleAIStudio announcement that ships the full developer guide for Gemini 3.5 Flash GA. 28K views, 487 likes within hours of the I/O 2026 keynote.

Get the latest on AI, LLMs & developer tools

New MCP servers, model updates, and guides like this one — delivered weekly.

1. The Announcement

The tweet above is the canonical pointer Google AI Studio used to publish the Gemini 3.5 Flash developer guide. It went out on Tuesday, May 19, 2026roughly two minutes after Sundar Pichai walked off the I/O stage. By the end of the day it had been bookmarked 223 times by developers — an unusual ratio of bookmarks to likes that tells you exactly who the audience was.

Sundar's parallel post from the same morning made the positioning explicit: “Gemini 3.5 Flash is available today for everyone in Antigravity and across our products and APIs. Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding.” That is Google publicly saying its cheap, fast Flash model now beats its previous flagship Pro model on most evaluations. That repositions the entire Gemini lineup.

Sundar Pichai

@sundarpichai

·Follow

Just off stage at #GoogleIO, some highlights from this morning 🧵 Gemini 3.5 Flash is available today for everyone in @antigravity and across our products and APIs. Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding. It’s also

5:59 PM · May 19, 2026

2.8K

Read 133 replies

Sundar Pichai · CEO, Google

3.5 Flash is in a league of its own

Sundar's post-keynote thread: 3.5 Flash beats 3.1 Pro on almost all benchmarks with huge coding progress, runs 4x faster than other frontier models, and sits alone in the top-right of the intelligence-vs-speed chart.

Logan Kilpatrick, who leads Google AI Studio, made the framing even starker: “Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own.” Note the word most powerful. That is Google saying its Flash model is its flagship now.

Logan Kilpatrick

@OfficialLoganK

·Follow

Welcome to Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own. We spent the last 6 months making sure Flash is great for real world use cases. It's available everywhere now!

5:41 PM · May 19, 2026

4.9K

Read 298 replies

Logan Kilpatrick · Google AI Studio Lead

3.5 Flash: most powerful model to date

Logan explicitly calls 3.5 Flash Google's most powerful model — pushing the frontier on intelligence, speed, and cost simultaneously.

2. TL;DR

Model ID: gemini-3.5-flash
Status: Generally Available (GA), stable for production
Context window: 1,000,000 input tokens
Max output: 65,536 tokens
Thinking: supported with three effort levels (low / medium / high)
New default effort: medium (was high in 3 Flash)
Speed: ~4x faster tokens per second vs other frontier models
Coding: beats Gemini 3.1 Pro on almost all benchmarks
Recommended API: Interactions API (new standard primitive)
Still not supported: Computer Use
Inside Antigravity: live today across all tiers

Why this matters

For two months, Gemini 3.1 Pro was the model you reached for inside Antigravity when you needed deep reasoning, and Flash was the model you reached for when you needed speed or quota relief. 3.5 Flash collapses that trade-off on coding workloads. Many of the heuristics in our 3.1 Pro vs Opus comparison need to be reread with Flash in the middle column.

3. What's New in 3.5 Flash

The developer guide bullets the changes in plain language. Here they are with the implication for an Antigravity user spelled out:

Sustained frontier performance. Google's framing is that this is its most intelligent Flash model, optimized for agentic and coding tasks at scale. Read: long sessions stop falling apart on token 200k+.
Agentic execution. Sub-agent deployment, problem solving, and rapid agentic loops at scale. Read: it survives Antigravity's orchestration mode where you spawn child agents for planning, coding, and review (see our multi-agent orchestration guide).
Coding loops. Iterative coding cycles, rapid exploration, prototyping to test alternate paths and dynamically explore solutions. Read: the “try, fail, fix, retry” loop costs less context per attempt.
Long horizon. Multi-step workflows and tool use at scale. Read: chains of 30+ tool calls stop degrading.
Thought preservation. Intermediate reasoning is now maintained across multi-turn conversations automatically — no API changes needed. More on this in section 12.
New default effort. medium replaces high as the default thinking level. More on this in section 5.
Improved low thinking. The low tier is now “significantly improved” for code and agentic tasks that require fewer steps — strong quality at lower latency and cost. More in section 6.
GA release. Stable. No more “preview” SLA gotchas. Production traffic is the supported path now.

4. Model Specs & Capabilities

Model ID:          gemini-3.5-flash
Context window:   1,000,000 input tokens
Max output:       65,536 tokens
Thinking:         supported (low / medium / high)
Default effort:   medium
Tools:             same as Gemini 3 Flash
Multimodal:       text + image + audio + video in
Computer Use:    not supported (yet)
Pricing:         see official pricing page
Status:           Generally Available (GA), stable

Tool surface is unchanged from 3 Flash, so any of your existing grounding-with-google-search, code-execution, url-context, and function-calling pipelines keep working without edits. If you were running into the 1% Claude Opus thinking-budget cap in Antigravity, the new effort tiers on 3.5 Flash give you a credible Gemini-side alternative for deep reasoning workloads.

5. Default Effort: high → medium

This is the single change most likely to surprise you and the easiest one to miss in the changelog. In 3 Flash, when you called the API without setting an effort level, the model defaulted to high. In 3.5 Flash, the unset default is now medium.

For most workloads this is a free win — medium effort on 3.5 Flash is roughly equivalent to high effort on 3 Flash, at lower latency and cost. But if you were relying on implicit high to get reliable agentic behavior on a hard long-horizon task, your traffic just silently degraded. The pattern is similar to the silent model downgrade behaviour Antigravity already exhibits under quota pressure. Two options:

Audit and explicit-set. Grep your codebase for calls that omit thinking_config and decide per call-site whether you wantmedium (cheaper, faster, GA-blessed) or high (the old implicit behavior).
Set high once, globally. If you have one shared client wrapper, set effort: "high" as the global default there and revisit per-call overrides later.

6. The 'low' Mode Got Smart

The other under-marketed change is that the low tier was rewritten. Google's phrasing: “low is now significantly improved for code and agentic tasks that require fewer steps, offering strong quality at lower latency and cost.”

Translation: workloads you previously had to send to medium to get a usable result will now finish on low. For Antigravity users that means a lot of the cleanup, rename, and small-refactor work that was burning credits at medium effort can drop to low. Try it on:

Variable / file renames across a small set of files
JSDoc / docstring generation
Single-function unit-test stubs
Code formatting and lint-rule application
One-step tool calls (read file, edit file, run test)

See our token-saving guide for a deeper playbook on routing work to the cheapest effort that still works.

7. Migrating to the Interactions API

The developer guide tells you to install the latest Google Gen AI SDK and notes that all the examples use the new Interactions API, framed as “the new standard primitive for building with Gemini, recommended for all new projects.” The older GenerateContent API is still supported and the same configuration options apply.

Practically, if you are starting a new agent, use Interactions. If you have an existing GenerateContent pipeline, you do not need to rewrite it today — but the API surface is being optimized around agentic workflows, server-side state management, and complex multi-modal multi-turn conversations. That is exactly the shape of an Antigravity sub-agent. Migration is going to age well.

8. Quickstart Code

A minimal Python call against 3.5 Flash via the Interactions API:

from google import genai

client = genai.Client(api_key="...")

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    instructions="You are a careful coding assistant.",
    input="Refactor this function to async/await...",
    thinking_config={"effort": "medium"},  # explicit default
)

print(interaction.output_text)

Three things to notice. First, the model ID is gemini-3.5-flash — no -preview or -latest suffix because GA. Second, the effort is set explicitly even though medium is the default; this protects you if Google ever changes the default again. Third, there is no manual thread bookkeeping — Interactions handles server-side state.

9. 3.5 Flash vs 3.1 Pro

Google's own framing in Sundar's post is that 3.5 Flash “is better across almost all benchmarks with huge progress in coding” relative to 3.1 Pro, and that on the intelligence-versus-output-speed plot it sits alone in the top-right quadrant. Here is that exact chart from the I/O 2026 keynote slide:

Sundar Pichai on stage at Google I/O 2026 showing the Gemini 3.5 Flash intelligence-vs-output-speed benchmark chart. 3.5 Flash sits alone in the top-right quadrant, separated from the cluster of competing frontier models. — From Sundar Pichai's I/O 2026 keynote: intelligence vs. output speed. 3.5 Flash is alone in the top-right quadrant. Source.

Dimension	Gemini 3.1 Pro	Gemini 3.5 Flash
Positioning	Frontier Pro tier	Most intelligent Flash model
Context window	1M input	1M input
Max output	65k tokens	65k tokens
Coding benchmarks	Strong	Better — “huge progress” per Sundar
Output speed	Pro-tier latency	~4x faster than frontier peers
Default effort	(per-call)	medium (changed from high)
Thought preservation	Limited across turns	Automatic, no API changes
Computer Use	Supported	Not yet

The Computer Use gap is the one reason you keep 3.1 Pro in your toolbox — for anything that needs to drive a browser or operate a UI, Flash is not the answer today. For everything else in a coding workflow, the cheaper, faster, GA model is now the model with the higher benchmark scores. That is unusual.

10. 3.5 Flash Inside Antigravity

Sundar called this out by name: “Gemini 3.5 Flash is available today for everyone in Antigravity and across our products and APIs.” Logan Kilpatrick followed up with the full distribution list:

Logan Kilpatrick

@OfficialLoganK

·Follow

Replying to @OfficialLoganK

Try it in the Gemini API, Google AI Studio, Antigravity, AI Mode, Gemini App, and wherever else you use Gemini!

5:45 PM · May 19, 2026

231

Read 34 replies

Logan Kilpatrick · Google AI Studio Lead

Try it in every Google surface

3.5 Flash shipped simultaneously to the Gemini API, Google AI Studio, Antigravity, AI Mode, the Gemini App, and every other Gemini surface on day one — no waitlist.

It is live in the model picker (Settings → Models) on Pro and Ultra tiers as of the May 19, 2026 keynote. A few practical notes:

The picker may show two Flash entries during the rollout window — 3 Flash and 3.5 Flash. Pick 3.5 unless you have a specific reason. If you see only one, your client probably needs a restart.
Effort tier controls still live in the same place — the three-tier low/medium/high selector under Settings → Models. Default is now medium.
Credit consumption should drop for most workloads because medium is cheaper than high, and many tasks that needed medium can now run on low. Track your usage with the Cockpit monitoring guide.
Browser sub-agent integration works on 3.5 Flash for read / analyze tasks, but full Computer Use control still requires 3.1 Pro.

11. Spark, Antigravity 2.0, and Why Flash Matters

The 3.5 Flash GA announcement did not ship in isolation. Two other launches from the same I/O morning explain why Google needed Flash to be both smart and fast.

Antigravity 2.0 — a rebuilt standalone desktop app with multi-agent teams, scheduled tasks, native voice, and one-click integration with other Google products. Scheduled tasks and multi-agent teams mean Google wanted a model that can carry sustained agentic work without bleeding cost. The launch post has the full surface-by-surface breakdown.
Antigravity CLI — the new Go-based terminal agent that replaces Gemini CLI as the supported terminal surface. Defaults to 3.5 Flash out of the box. If you live in the terminal, this is the surface 3.5 Flash was tuned for.
Gemini Spark — a 24/7 personal AI agent inside the Gemini app, “built on Antigravity”, running on dedicated VMs in Google Cloud, and explicitly powered by Gemini 3.5. Spark is the consumer-facing reason 3.5 Flash had to ship GA today: every Spark user's background task is a 3.5 Flash call.

Google Antigravity

@antigravity

·Follow

Introducing Antigravity 2.0, a new standalone desktop application that delivers fully on that original glimpse of a truly agent-optimized experience. Rebuilt from the ground up with multi-agent teams, scheduled tasks, native voice and one-click integration with other Google

Watch on X

5:52 PM · May 19, 2026

4.6K

Read 704 replies

Google Antigravity · May 19, 2026

Antigravity 2.0 launches alongside 3.5 Flash

The official @antigravity announcement of the 2.0 standalone desktop app — multi-agent teams, scheduled tasks, native voice, one-click Google integration. The platform that 3.5 Flash was built to power.

Logan's sign-off captured the through-line: “The model is the product.” 3.5 Flash is not a standalone release — it's the engine Google needs for Spark to be cheap, Antigravity 2.0 to be agentic, and AI Mode to be fast, all at once.

Reading the three announcements together, 3.5 Flash is the workhorse model Google intends every long-running agentic loop — Antigravity sub-agents, Spark background jobs, scheduled tasks — to run on. Pro and Ultra Pro stay reserved for cases where you specifically need extra reasoning depth or Computer Use.

12. Thought Preservation Across Turns

The most quietly important capability change is thought preservation. Per the guide: “The model maintains intermediate reasoning across multi-turn conversations automatically. No API changes needed.”

In 3 Flash, every turn started with a fresh thinking pass. If turn 1 had carefully reasoned about your data model and produced an answer, turn 2 would re-derive whatever it needed from scratch. In 3.5 Flash, those intermediate reasoning traces are carried forward server-side. The model picks up where it left off.

Implications for Antigravity workflows:

Long planning sessions stop losing the plot on turn 8.
Sub-agent “handoffs” where one agent passes a task to another preserve more of the original chain of thought.
You can prompt with “OK now do the same for the other module” and actually get the same approach, not a re-derived parallel attempt.
The downside: a wrong assumption on turn 1 can poison turns 2–N. If a session goes sideways, start a new chat rather than trying to argue the agent out of its preserved reasoning.

13. What Flash Still Can't Do

The developer guide is explicit: Computer Use is not supported on 3.5 Flash at this moment. Everything else from the 3 Flash tool surface is on the table.

If your agent needs to control a browser, fill forms, navigate a UI, or take screenshots and click on them — the kind of work 3.1 Pro's Computer Use mode handles — you have to either keep 3.1 Pro in your routing logic for those calls, or wait for 3.5 Pro / a 3.5-tier Computer Use to ship.

Routing pattern

A clean way to handle this in Antigravity sub-agents is to default the coder and planner roles to 3.5 Flash, and route only the browser-driver role to 3.1 Pro. The browser-driver call is usually the smallest fraction of tokens in a session, so this gives you 3.5 Flash's cost profile on the bulk of the work without losing Computer Use entirely.

14. Pricing & Quota Implications

Google did not publish a new pricing card with the announcement — the guide links to the existing pricing page. The practical-impact directions in Antigravity follow from three facts:

Default effort dropped one tier (high → medium). At the same call count, that is cheaper per call.
Low got smarter. More calls that previously needed medium can run on low. Even more savings.
Thought preservation reduces redundant thinking. Turn N stops paying for what turns 1..N-1 already figured out.

Net: typical Antigravity sessions on 3.5 Flash should consume noticeably less of your weekly quota than the same session would have on 3 Flash. If you were running close to the cap, this announcement effectively gives you headroom. For the full quota mechanics, see credits and pricing explained and weekly quota cooldowns.

15. Migration Checklist

If you have an Antigravity workflow or a direct Gemini API integration, do these in order this week:

Switch the model picker to gemini-3.5-flash for your default coding model in Antigravity. Restart the client if you do not see it.
Decide your effort policy. Pick a global default (medium or high) and document it. Set it explicitly in your client wrapper so a future default change does not surprise you.
Drop one tier where you can. Try cleanup, rename, format, and simple-tool-call work at low first.
Keep 3.1 Pro for Computer Use. Route any browser-driving sub-agent to 3.1 Pro explicitly; 3.5 Flash will not do it.
Start new chats more aggressively. Thought preservation makes stale assumptions costlier — biased reasoning carries forward across turns.
Migrate new agents to the Interactions API. Do not rewrite existing GenerateContent code yet. Just stop adding new code on the old API.
If you use Gemini CLI, plan your CLI migration. Google is sunsetting Gemini CLI for individual Pro / Ultra / free Code Assist users on June 18, 2026. Antigravity CLI is the replacement and it defaults to 3.5 Flash. See the Gemini CLI → Antigravity CLI migration guide for the step-by-step.
Re-baseline your benchmarks. Any internal eval suite that assumed Pro > Flash needs to be rerun. The ranking changed.

16. Verdict

Gemini 3.5 Flash is the first Flash release where “use Flash” is no longer a compromise for coding workloads. It is faster, cheaper, smarter on almost every benchmark Google chose to put on the I/O stage, GA stable, and already wired into Antigravity. The only legitimate reason to keep 3.1 Pro in your default routing is Computer Use; for everything else, 3.5 Flash is the better daily driver.

If you only do one thing today: open Antigravity, switch your default model to Gemini 3.5 Flash, and re-run yesterday's hardest coding session. The win is unusually obvious.