Back to AI Tools & Agents

image-gen

image generationai imagegeminiwebsite designvisual contenthero imagesinfographicsai tools
⭐ 860πŸ“„ MITπŸ•’ 2026-06-11Source β†—

Install this skill

npx skills add jezweb/claude-skills

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The image-gen skill provides a programmatic interface for creating and modifying visual assets using Gemini's latest generative models. It enables developers to generate high-fidelity images from text, apply complex edits, and control output dimensions directly within their codebase. The tool supports advanced features such as style transfer, element manipulation, and reliable text rendering within generated frames. Because it requires the current Google GenAI SDK, it replaces legacy implementations to ensure compliance with Google's updated infrastructure. Developers can switch between models tailored for rapid prototyping or high-resolution 4K output, including specific configurations for aspect ratios and resolution settings. This skill is optimized for environments where custom brand imagery or dynamic visual content is required, moving beyond standard stock photography by allowing precise, model-driven image composition.

When to Use This Skill

  • β€’Creating custom website hero images that match local business contexts
  • β€’Designing branded infographics and diagrams requiring legible text
  • β€’Performing style transfers to maintain consistency across a site gallery
  • β€’Modifying existing images by adding or removing specific elements

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œgenerate a hero image of a plumber in an Australian kitchen
  • β€œcreate a 16:9 infographic for building construction
  • β€œadd a laptop to this existing office photo
  • β€œchange the wall color in this interior shot to blue
  • β€œgenerate a 4K image showing a solar panel installation

Pro Tips

  • πŸ’‘Leverage multi-turn editing to refine generated images iteratively, guiding the AI to your precise vision.
  • πŸ’‘Specify regional aesthetics, like 'Australian imagery patterns,' in your prompts for locally relevant visuals.
  • πŸ’‘For complex text in images, break down your prompt into smaller, more focused requests to ensure legibility.

What this skill does

  • β€’Generate original images from descriptive text prompts
  • β€’Edit, extend, or widen existing image assets
  • β€’Support for high-precision text rendering within images
  • β€’Multi-reference image analysis for style consistency
  • β€’Detailed aspect ratio and resolution configuration

When not to use it

  • βœ•When authentic product shots or real-world team photography is available
  • βœ•When projects involve strict legal or compliance requirements regarding AI-generated media

Example workflow

  1. Initialize the GoogleGenAI client with your API key
  2. Define the model ID based on speed or resolution requirements
  3. Construct the prompt including visual details and target aspect ratio
  4. Set responseModalities to include both TEXT and IMAGE
  5. Execute the API call and convert the base64 output into a PNG file

Prerequisites

  • –@google/genai SDK installed
  • –Valid Google Gemini API Key
  • –Environment variable configuration for API credentials

Pitfalls & limitations

  • !Failing to include both TEXT and IMAGE in responseModalities causes request failure
  • !Using lowercase 'k' in resolution parameters breaks the API call
  • !Exceeding five human reference images leads to unstable character consistency
  • !Deprecated models will trigger errors as they are retired from service

FAQ

Why is my 16:9 request returning a square image?
Recent backend updates may occasionally default to 1:1 if the specific aspect ratio is not explicitly handled or if the model version has internal constraints.
Can I use the old @google/generative-ai package?
No, that package is deprecated. You must migrate to @google/genai to maintain compatibility with current image generation endpoints.
Which model is best for diagrams with text?
Gemini 3 Pro Image Preview is the recommended choice, as it achieves 94% text legibility at 4K resolution.

How it compares

Unlike manual editing in software like Photoshop, this tool automates composition via natural language, allowing for scalable, dynamic updates that retain consistent branding across hundreds of assets.

Source & trust

⭐ 860 starsπŸ“„ MITπŸ•’ Updated 2026-06-11
πŸ“„ Full skill instructions β€” original source: jezweb/claude-skills
# Image Generation Skill

Generate and edit website images using Gemini Native Image Generation.

## ⚠️ Critical: SDK Migration Required

**IMPORTANT**: The @google/generative-ai package is deprecated as of November 30, 2025. All new projects must use @google/genai.

**Migration Required**:
// ❌ OLD (deprecated, support ended Nov 30, 2025)
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(API_KEY);

// βœ… NEW (required)
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: API_KEY });


**Source**: [GitHub Repository Migration Notice](https://github.com/google-gemini/deprecated-generative-ai-js)

## Models

| Model | ID | Status | Best For |
|-------|-----|--------|----------|
| **Gemini 3 Pro Image** | gemini-3-pro-image-preview | Preview (Nov 20, 2025) | 4K, complex prompts, text |
| **Gemini 2.5 Flash Image** | gemini-2.5-flash-image | GA (Oct 2, 2025) | Fast iteration, general use |
| **Imagen 4.0** | imagen-4.0-generate-001 | GA (Aug 14, 2025) | Alternative platform |

**Deprecated Models** (do not use):
- gemini-2.0-flash-exp-image-generation - Shut down Nov 11, 2025
- gemini-2.0-flash-preview-image-generation - Shut down Nov 11, 2025
- gemini-2.5-flash-image-preview - Scheduled shutdown Jan 15, 2026

**Source**: [Google AI Changelog](https://ai.google.dev/gemini-api/docs/changelog)

## Capabilities

| Feature | Supported |
|---------|-----------|
| Generate from text | βœ… |
| Edit existing images | βœ… |
| Change aspect ratio | βœ… |
| Widen/extend images | βœ… |
| Style transfer | βœ… |
| Change colours | βœ… |
| Add/remove elements | βœ… |
| Text in images | βœ… (legible!) |
| Multiple reference images | βœ… (up to 14: max 5 humans, 9 objects) |
| 4K resolution | βœ… (Pro only) |

**Note**: Exceeding 5 human reference images causes unpredictable character consistency. Keep human images ≀ 5 for reliable results.

## Aspect Ratios

1:1   | 2:3  | 3:2  | 3:4  | 4:3
4:5 | 5:4 | 9:16 | 16:9 | 21:9


## Resolutions (Pro only)

| Size | 1:1 | 16:9 | 4:3 |
|------|-----|------|-----|
| 1K | 1024x1024 | 1376x768 | 1184x880 |
| 2K | 2048x2048 | 2752x1536 | 2368x1760 |
| 4K | 4096x4096 | 5504x3072 | 4736x3520 |

## Quick Start

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Generate new image
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image",
contents: "A professional plumber in hi-vis working in modern Australian home",
config: {
responseModalities: ["TEXT", "IMAGE"], // BOTH required - cannot use ["IMAGE"] alone
imageGenerationConfig: {
aspectRatio: "16:9",
},
},
});

// Extract image
for (const part of response.candidates[0].content.parts) {
if (part.inlineData) {
const buffer = Buffer.from(part.inlineData.data, "base64");
fs.writeFileSync("hero.png", buffer);
}
}


**Important**: responseModalities must include both ["TEXT", "IMAGE"]. Using ["IMAGE"] alone may fail or produce unexpected results.

## Model Selection

| Requirement | Use |
|-------------|-----|
| Fast iteration | Gemini 2.5 Flash Image |
| 4K resolution | Gemini 3 Pro Image Preview |
| Text in images | Gemini 3 Pro (94% legibility at 4K) |
| Simple edits | Gemini 2.5 Flash Image |
| Complex compositions | Gemini 3 Pro Image Preview |
| Infographics/diagrams | Gemini 3 Pro Image Preview |

**Text Rendering Benchmarks** (4K resolution):
- Gemini 3 Pro Image: 94% legible text
- DALL-E 3: 78% legible text
- Midjourney: Decorative pseudo-text only

## When to Use

**Use Gemini Image Gen when:**
- Stock photos don't fit brand/context
- Need Australian-specific imagery
- Need text in images (infographics, diagrams)
- Need consistent style across multiple images
- Need to edit/modify existing images
- Client has no photos of their work

**Don't use when:**
- Client has good photos of actual work
- Real team photos needed (discuss first)
- Product shots (use real products)
- Legal/compliance concerns

## Known Issues Prevention

This skill prevents **5** documented issues:

### Issue #1: Resolution Parameter Case Sensitivity
**Error**: Request fails with invalid parameter error
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: Resolution values are case-sensitive and must use uppercase 'K'.
**Prevention**: Always use "4K", "2K", "1K" - never lowercase "4k".

// ❌ WRONG - causes request failure
config: { imageGenerationConfig: { resolution: "4k" } }

// βœ… CORRECT - uppercase required
config: { imageGenerationConfig: { resolution: "4K" } }


### Issue #2: Aspect Ratio May Be Ignored (Sept 2025+)
**Error**: Returns 1:1 square image despite requesting 16:9 or other ratios
**Source**: [Google Support Thread](https://support.google.com/gemini/thread/371311134/)
**Why It Happens**: Backend update in September 2025 affected Gemini 2.5 Flash Image model's aspect ratio handling.
**Prevention**: Use Gemini 3 Pro Image Preview for reliable aspect ratio control, or generate 1:1 and use multi-turn editing to extend.

// May ignore aspectRatio on Gemini 2.5 Flash Image
model: "gemini-2.5-flash-image",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }

// More reliable for aspect ratio control
model: "gemini-3-pro-image-preview",
config: { imageGenerationConfig: { aspectRatio: "16:9" } }


**Status**: Google confirmed working on fix (Sept 2025).

### Issue #3: Exceeding 5 Human Reference Images
**Error**: Unpredictable character consistency in generated images
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: Gemini 3 Pro Image supports up to 14 reference images total, but only 5 can be human images for character consistency.
**Prevention**: Limit human images to 5 or fewer. Use remaining slots (up to 14 total) for objects/scenes.

// ❌ WRONG - 7 human images exceeds limit
const humanImages = [img1, img2, img3, img4, img5, img6, img7];
const prompt = [
{ text: "Generate consistent characters" },
...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];

// βœ… CORRECT - max 5 human images
const humanImages = images.slice(0, 5); // Limit to 5
const objectImages = images.slice(5, 14); // Up to 9 more for objects
const prompt = [
{ text: "Generate consistent characters" },
...humanImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
...objectImages.map(img => ({ inlineData: { data: img, mimeType: "image/png" }})),
];


### Issue #4: SynthID Watermark Cannot Be Disabled
**Error**: N/A (documented limitation)
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: All generated images automatically include a SynthID watermark for content authenticity tracking.
**Prevention**: Be aware of this limitation for commercial use cases. Watermark cannot be disabled by developers.

### Issue #5: Google Search Grounding Excludes Image Results
**Error**: Generated images don't reflect visual search results, only text
**Source**: [Google AI Image Generation Docs](https://ai.google.dev/gemini-api/docs/image-generation)
**Why It Happens**: When using Google Search tool with image generation, "image-based search results are not passed to the generation model."
**Prevention**: Only text-based search results inform the visual output. Don't expect the model to reference images from search results.

// Google Search tool enabled
const response = await ai.models.generateContent({
model: "gemini-3-pro-image-preview",
contents: "Generate image of latest iPhone design",
tools: [{ googleSearch: {} }],
config: { responseModalities: ["TEXT", "IMAGE"] },
});
// Result: Only text search results used, not image results from web search


## Pricing

**Current Pricing** (as of November 2025):
- **Gemini 2.5 Flash Image**: ~$0.008 per image
- Input: 258 tokens per image
- Output: 1290 tokens per image
- Rate: $30.00 per 1M output tokens

**Note**: The generateImages API (Imagen models) does not return usageMetadata in responses. Track costs manually based on pricing above.

**Source**: [Google Developers Blog - Gemini 2.5 Flash Image](https://developers.googleblog.com/introducing-gemini-2-5-flash-image/)

## Reference Files

- references/prompting.md - Effective prompt patterns
- references/website-images.md - Hero, service, background templates
- references/editing.md - Multi-turn editing patterns
- references/local-imagery.md - Australian-specific details
- references/integration.md - API code examples

---

**Last verified**: 2026-01-21 | **Skill version**: 2.0.0 | **Changes**: Added SDK migration notice (critical), updated to current model names (gemini-3-pro-image-preview, gemini-2.5-flash-image), added 5 Known Issues (resolution case sensitivity, aspect ratio bug, reference image limits, SynthID watermark, Google Search grounding), added pricing section, added text rendering benchmarks.


---

# Image Generation Rules

Correction rules for Gemini Native Image Generation.

## Model Selection

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| DALL-E for website images | Gemini 3 Image Generation (better text, editing) |
| Midjourney | Gemini 3 Image Generation (API access, editing) |
| gemini-pro-vision for generation | gemini-3-flash-image-generation |
| Generic model for 4K | gemini-3-pro-image-generation |

## API Configuration

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| generateImage() method | generateContent() with responseModalities: ["TEXT", "IMAGE"] |
| Missing responseModalities | Always include responseModalities: ["TEXT", "IMAGE"] |
| imageConfig | imageGenerationConfig |
| size: "1024x1024" | aspectRatio: "1:1" (use aspect ratio, not pixel size) |

Correct config structure:

const response = await ai.models.generateContent({
model: "gemini-3-flash-image-generation",
contents: prompt,
config: {
responseModalities: ["TEXT", "IMAGE"],
imageGenerationConfig: {
aspectRatio: "16:9",
// imageSize: "2K", // Pro only
},
},
});


## Prompting

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Keyword lists | Descriptive narrative paragraph |
| "high quality, 4k, detailed" | Specific scene description with lighting |
| Generic "professional photo" | Specify: camera, lens, lighting, setting |
| Missing context | Include environment, time of day, atmosphere |

## Australian Imagery

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Generic power outlet | Australian Type I (angled prongs) |
| Yellow safety vest | Hi-vis orange/yellow Australian standard |
| Left-hand drive vehicle | Right-hand drive (Australian) |
| Generic architecture | Queenslander, Federation, or modern Australian |
| Imperial measurements | Metric signage |

## Text in Images

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Gemini 3 Flash for infographics | Gemini 3 Pro (better text legibility) |
| Long paragraphs in image | Short labels, headlines only |
| Small text | Large, bold text (minimum 24pt equivalent) |
| Decorative fonts | Clear sans-serif for legibility |

## Multi-Turn Editing

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Generate new image for each edit | Use chat/multi-turn for refinement |
| Restate full prompt on edit | Reference previous + state change only |
| Single-shot complex edits | Break into multiple turns |

Correct pattern:

const chat = client.chats.create({ model: "...", config: { ... } });

// Turn 1: Generate
const response1 = await chat.send_message("Create a hero image...");

// Turn 2: Edit
const response2 = await chat.send_message("Change the vest color to green");

// Turn 3: Refine
const response3 = await chat.send_message("Widen the image to 21:9");


## Output Handling

| If Claude suggests... | Use instead... |
|----------------------|----------------|
| Assume single image output | Loop through all parts for images |
| Text-only response check | Check for inlineData in parts |
| Direct base64 to file | Decode: Buffer.from(data, "base64") |

## Limitations to Mention

- All images include SynthID watermark (invisible, for AI detection)
- Exact number of output images not guaranteed
- Flash model: up to 3 input images
- Pro model: up to 14 input images (5 high-fidelity objects, 5 humans)
- Best language support: EN, DE, ES, FR, JA, KO, PT, ZH

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/image-gen/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/jezweb/claude-skills/image-gen/SKILL.md
  • Cursor: ~/.cursor/skills/jezweb/claude-skills/image-gen/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/jezweb/claude-skills/image-gen/SKILL.md

πŸš€ Install with CLI:
npx skills add jezweb/claude-skills

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid ai tools & agents issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under AI Tools & Agents and is published by JezWeb, maintained in jezweb/claude-skills.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.