image-gen
Install this skill
npx skills add jezweb/claude-skillsWorks across Claude Code, Cursor, Codex, Copilot & Antigravity
The image-gen skill provides a programmatic interface for creating and modifying visual assets using Gemini's latest generative models. It enables developers to generate high-fidelity images from text, apply complex edits, and control output dimensions directly within their codebase. The tool supports advanced features such as style transfer, element manipulation, and reliable text rendering within generated frames. Because it requires the current Google GenAI SDK, it replaces legacy implementations to ensure compliance with Google's updated infrastructure. Developers can switch between models tailored for rapid prototyping or high-resolution 4K output, including specific configurations for aspect ratios and resolution settings. This skill is optimized for environments where custom brand imagery or dynamic visual content is required, moving beyond standard stock photography by allowing precise, model-driven image composition.
When to Use This Skill
- β’Creating custom website hero images that match local business contexts
- β’Designing branded infographics and diagrams requiring legible text
- β’Performing style transfers to maintain consistency across a site gallery
- β’Modifying existing images by adding or removing specific elements
How to Invoke This Skill
Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:
- βgenerate a hero image of a plumber in an Australian kitchen
- βcreate a 16:9 infographic for building construction
- βadd a laptop to this existing office photo
- βchange the wall color in this interior shot to blue
- βgenerate a 4K image showing a solar panel installation
Pro Tips
- π‘Leverage multi-turn editing to refine generated images iteratively, guiding the AI to your precise vision.
- π‘Specify regional aesthetics, like 'Australian imagery patterns,' in your prompts for locally relevant visuals.
- π‘For complex text in images, break down your prompt into smaller, more focused requests to ensure legibility.
What this skill does
- β’Generate original images from descriptive text prompts
- β’Edit, extend, or widen existing image assets
- β’Support for high-precision text rendering within images
- β’Multi-reference image analysis for style consistency
- β’Detailed aspect ratio and resolution configuration
When not to use it
- βWhen authentic product shots or real-world team photography is available
- βWhen projects involve strict legal or compliance requirements regarding AI-generated media
Example workflow
- Initialize the GoogleGenAI client with your API key
- Define the model ID based on speed or resolution requirements
- Construct the prompt including visual details and target aspect ratio
- Set responseModalities to include both TEXT and IMAGE
- Execute the API call and convert the base64 output into a PNG file
Prerequisites
- β@google/genai SDK installed
- βValid Google Gemini API Key
- βEnvironment variable configuration for API credentials
Pitfalls & limitations
- !Failing to include both TEXT and IMAGE in responseModalities causes request failure
- !Using lowercase 'k' in resolution parameters breaks the API call
- !Exceeding five human reference images leads to unstable character consistency
- !Deprecated models will trigger errors as they are retired from service
FAQ
How it compares
Unlike manual editing in software like Photoshop, this tool automates composition via natural language, allowing for scalable, dynamic updates that retain consistent branding across hundreds of assets.
π Full skill instructions β original source: jezweb/claude-skills
Generate and edit website images using Gemini Native Image Generation.
## β οΈ Critical: SDK Migration Required
**IMPORTANT**: The
@google/generative-ai package is deprecated as of November 30, 2025. All new projects must use @google/genai.**Migration Required**:
// β OLD (deprecated, support ended Nov 30, 2025)
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(API_KEY);
// β
NEW (required)
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: API_KEY });**Source**: [GitHub Repository Migration Notice](https://github.com/google-gemini/deprecated-generative-ai-js)
## Models
| Model | ID | Status | Best For |
|-------|-----|--------|----------|
| **Gemini 3 Pro Image** |
gemini-3-pro-image-preview | Preview (Nov 20, 2025) | 4K, complex prompts, text || **Gemini 2.5 Flash Image** |
gemini-2.5-flash-image | GA (Oct 2, 2025) | Fast iteration, general use || **Imagen 4.0** |
imagen-4.0-generate-001 | GA (Aug 14, 2025) | Alternative platform |**Deprecated Models** (do not use):
-
gemini-2.0-flash-exp-image-generation - Shut down Nov 11, 2025-
gemini-2.0-flash-preview-image-generation - Shut down Nov 11, 2025-
gemini-2.5-flash-image-preview - Scheduled shutdown Jan 15, 2026**Source**: [Google AI Changelog](https://ai.google.dev/gemini-api/docs/changelog)
## Capabilities
| Feature | Supported |
|---------|-----------|
| Generate from text | β |
| Edit existing images | β |
| Change aspect ratio | β |
| Widen/extend images | β |
| Style transfer | β |
| Change colours | β |
| Add/remove elements | β |
| Text in images | β (legible!) |
| Multiple reference images | β (up to 14: max 5 humans, 9 objects) |
| 4K resolution | β (Pro only) |
**Note**: Exceeding 5 human reference images causes unpredictable character consistency. Keep human images β€ 5 for reliable results.
## Aspect Ratios
1:1 | 2:3 | 3:2 | 3:4 | 4:3
4:5 | 5:4 | 9:16 | 16:9 | 21:9## Resolutions (Pro only)
| Size | 1:1 | 16:9 | 4:3 |
|------|-----|------|-----|
| 1K | 1024x1024 | 1376x768 | 1184x880 |
| 2K | 2048x2048 | 2752x1536 | 2368x1760 |
| 4K | 4096x4096 | 5504x3072 | 4736x3520 |
## Quick Start
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Generate new image
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image",
contents: "A professional plumber in hi-vis working in modern Australian home",
config: {
responseModalities: ["TEXT", "IMAGE"], // BOTH required - cannot use ["IMAGE"] alone
imageGenerationConfig: {
aspectRatio: "16:9",
},
},
});
// Extract image
for (const part of response.candidates[0].content.parts) {
if (part.inlineData) {
const buffer = Buffer.from(part.inlineData.data, "base64");
fs.writeFileSync("hero.png", buffer);
}
}**Important**:
responseModalities must include both ["TEXT", "IMAGE"]. Using ["IMAGE"] alone may fail or produce unexpected results.## Model Selection
| Requirement | Use |
|-------------|-----|
| Fast iteration | Gemini 2.5 Flash Image |
| 4K resolution | Gemini 3 Pro Image Preview |
| Text in images | Gemini 3 Pro (94% legibility at 4K) |
| Simple edits | Gemini 2.5 Flash Image |
| Complex compositions | Gemini 3 Pro Image Preview |
| Infographics/diagrams | Gemini 3 Pro Image Preview |
**Text Rendering Benchmarks** (4K resolution):
- Gemini 3 Pro Image: 94% legible text
- DALL-E 3: 78% legible text
- Midjourney: Decorative pseudo-text only
## When to Use
**Use Gemini Image Gen when:**
- Stock photos don't fit brand/context
- Need Australian-specific imagery
- Need text in images (infographics, diagrams)
- Need consistent style across multiple images
- Need to edit/modify existing images
- Client has no photos of their work
**Don't use when:**
- Client has good photos of actual work
- Real team photos needed (discuss first)
- Product shots (use real products)
- Legal/compliance concerns
## Known Issues Prevention
This skill prevents **5** documented issues:
### Issue #1: Resolution Parameter Case Sensitivity
**Error**: Request fails with invalid parameter error
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: Resolution values are case-sensitive and must use uppercase 'K'.
**Prevention**: Always use
"4K", "2K", "1K" - never lowercase "4k".// β WRONG - causes request failure
config: { imageGenerationConfig: { resolution: "4k" } }
// β
CORRECT - uppercase required
config: { imageGenerationConfig: { resolution: "4K" } }### Issue #2: Aspect Ratio May Be Ignored (Sept 2025+)
**Error**: Returns 1:1 square image despite requesting 16:9 or other ratios
**Source**: [Google Support Thread](https://support.google.com/gemini/thread/371311134/)
**Why It Happens**: Backend update in September 2025 affected Gemini 2.5 Flash Image model's aspect ratio handling.
**Prevention**: Use Gemini 3 Pro Image Preview for reliable aspect ratio control, or generate 1:1 and use multi-turn editing to extend.
// May ignore aspectRatio on Gemini 2.5 Flash Image
model: "gemini-2.5-flash-image",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }
// More reliable for aspect ratio control
model: "gemini-3-pro-image-preview",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }**Status**: Google confirmed working on fix (Sept 2025).
### Issue #3: Exceeding 5 Human Reference Images
**Error**: Unpredictable character consistency in generated images
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: Gemini 3 Pro Image supports up to 14 reference images total, but only 5 can be human images for character consistency.
**Prevention**: Limit human images to 5 or fewer. Use remaining slots (up to 14 total) for objects/scenes.
// β WRONG - 7 human images exceeds limit
const humanImages = [img1, img2, img3, img4, img5, img6, img7];
const prompt = [
{ text: "Generate consistent characters" },
...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];
// β
CORRECT - max 5 human images
const humanImages = images.slice(0, 5); // Limit to 5
const objectImages = images.slice(5, 14); // Up to 9 more for objects
const prompt = [
{ text: "Generate consistent characters" },
...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
...objectImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];### Issue #4: SynthID Watermark Cannot Be Disabled
**Error**: N/A (documented limitation)
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: All generated images automatically include a SynthID watermark for content authenticity tracking.
**Prevention**: Be aware of this limitation for commercial use cases. Watermark cannot be disabled by developers.
### Issue #5: Google Search Grounding Excludes Image Results
**Error**: Generated images don't reflect visual search results, only text
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: When using Google Search tool with image generation, "image-based search results are not passed to the generation model."
**Prevention**: Only text-based search results inform the visual output. Don't expect the model to reference images from search results.
// Google Search tool enabled
const response = await ai.models.generateContent({
model: "gemini-3-pro-image-preview",
contents: "Generate image of latest iPhone design",
tools: [{ googleSearch: {} }],
config: { responseModalities: ["TEXT", "IMAGE"] },
});
// Result: Only text search results used, not image results from web search## Pricing
**Current Pricing** (as of November 2025):
- **Gemini 2.5 Flash Image**: ~$0.008 per image
- Input: 258 tokens per image
- Output: 1290 tokens per image
- Rate: $30.00 per 1M output tokens
**Note**: The
generateImages API (Imagen models) does not return usageMetadata in responses. Track costs manually based on pricing above.**Source**: [Google Developers Blog - Gemini 2.5 Flash Image](https://developers.googleblog.com/introducing-gemini-2-5-flash-image/)
## Reference Files
-
references/prompting.md - Effective prompt patterns-
references/website-images.md - Hero, service, background templates-
references/editing.md - Multi-turn editing patterns-
references/local-imagery.md - Australian-specific details-
references/integration.md - API code examples---
**Last verified**: 2026-01-21 | **Skill version**: 2.0.0 | **Changes**: Added SDK migration notice (critical), updated to current model names (gemini-3-pro-image-preview, gemini-2.5-flash-image), added 5 Known Issues (resolution case sensitivity, aspect ratio bug, reference image limits, SynthID watermark, Google Search grounding), added pricing section, added text rendering benchmarks.
---
# Image Generation Rules
Correction rules for Gemini Native Image Generation.
## Model Selection
| If Claude suggests... | Use instead... |
|----------------------|----------------|
| DALL-E for website images | Gemini 3 Image Generation (better text, editing) |
| Midjourney | Gemini 3 Image Generation (API access, editing) |
|
gemini-pro-vision for generation | gemini-3-flash-image-generation || Generic model for 4K |
gemini-3-pro-image-generation |## API Configuration
| If Claude suggests... | Use instead... |
|----------------------|----------------|
|
generateImage() method | generateContent() with responseModalities: ["TEXT", "IMAGE"] || Missing
responseModalities | Always include responseModalities: ["TEXT", "IMAGE"] ||
imageConfig | imageGenerationConfig ||
size: "1024x1024" | aspectRatio: "1:1" (use aspect ratio, not pixel size) |Correct config structure:
const response = await ai.models.generateContent({
model: "gemini-3-flash-image-generation",
contents: prompt,
config: {
responseModalities: ["TEXT", "IMAGE"],
imageGenerationConfig: {
aspectRatio: "16:9",
// imageSize: "2K", // Pro only
},
},
});## Prompting
| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Keyword lists | Descriptive narrative paragraph |
| "high quality, 4k, detailed" | Specific scene description with lighting |
| Generic "professional photo" | Specify: camera, lens, lighting, setting |
| Missing context | Include environment, time of day, atmosphere |
## Australian Imagery
| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Generic power outlet | Australian Type I (angled prongs) |
| Yellow safety vest | Hi-vis orange/yellow Australian standard |
| Left-hand drive vehicle | Right-hand drive (Australian) |
| Generic architecture | Queenslander, Federation, or modern Australian |
| Imperial measurements | Metric signage |
## Text in Images
| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Gemini 3 Flash for infographics | Gemini 3 Pro (better text legibility) |
| Long paragraphs in image | Short labels, headlines only |
| Small text | Large, bold text (minimum 24pt equivalent) |
| Decorative fonts | Clear sans-serif for legibility |
## Multi-Turn Editing
| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Generate new image for each edit | Use chat/multi-turn for refinement |
| Restate full prompt on edit | Reference previous + state change only |
| Single-shot complex edits | Break into multiple turns |
Correct pattern:
const chat = client.chats.create({ model: "...", config: { ... } });
// Turn 1: Generate
const response1 = await chat.send_message("Create a hero image...");
// Turn 2: Edit
const response2 = await chat.send_message("Change the vest color to green");
// Turn 3: Refine
const response3 = await chat.send_message("Widen the image to 21:9");## Output Handling
| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Assume single image output | Loop through all parts for images |
| Text-only response check | Check for
inlineData in parts || Direct base64 to file | Decode:
Buffer.from(data, "base64") |## Limitations to Mention
- All images include SynthID watermark (invisible, for AI detection)
- Exact number of output images not guaranteed
- Flash model: up to 3 input images
- Pro model: up to 14 input images (5 high-fidelity objects, 5 humans)
- Best language support: EN, DE, ES, FR, JA, KO, PT, ZH
How to Use This Skill Unit
Option A: Project-Specific (Recommended)
- Click "Download" above
- In your project, create the directory:
.agent/skills/image-gen/ - Save the file as
SKILL.md - The agent will automatically discover the skill based on its description.
Option B: Global Installation (All Agents)
Save the file to these locations to make it available across all projects:
- Claude Code:
~/.claude/skills/jezweb/claude-skills/image-gen/SKILL.md - Cursor:
~/.cursor/skills/jezweb/claude-skills/image-gen/SKILL.md - Antigravity:
~/.gemini/antigravity/skills/jezweb/claude-skills/image-gen/SKILL.md
π Install with CLI:npx skills add jezweb/claude-skills