Back to Creative & Visual

baoyu-image-gen

image generationAIDALL-EImagenOpenAIGoogle APIstext-to-imagecreative AI
⭐ 21.7kπŸ•’ 2026-06-13Source β†—

Install this skill

npx skills add jimliu/baoyu-skills

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The baoyu-image-gen skill enables direct image generation through your terminal using the AI SDK. It bridges the gap between CLI commands and sophisticated visual models from OpenAI and Google. By automating the interaction with providers like DALL-E, GPT Image, and Gemini/Imagen, it allows developers to script visual assets directly into their build or data-processing pipelines. Users can define parameters like aspect ratios, quality presets, and custom output filenames without manual web-interface interaction. It excels in environments where image creation needs to be repeatable, such as generating batches of icons, UI mockups from text descriptions, or integrating multi-modal analysis workflows that involve reference images for style consistency. The tool manages provider selection automatically based on your available API keys, simplifying the setup for multi-model visual workflows.

When to Use This Skill

  • β€’Automating the generation of social media assets from text-based content files
  • β€’Creating style-consistent UI mockups by using existing design references
  • β€’Integrating image synthesis into automated build or testing pipelines
  • β€’Batch generating custom placeholder imagery for rapid prototyping

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œgenerate a 16:9 hero image for my website
  • β€œcreate a high-quality icon using my reference image
  • β€œrun a batch image generation from my system prompts
  • β€œuse google gemini to synthesize this image based on my ref file
  • β€œsave a 2k resolution landscape image as landscape.png

Pro Tips

  • πŸ’‘Leverage prompt files (`--promptfiles`) for intricate descriptions, allowing you to manage complex prompts and iterate on them easily, rather than typing long prompts directly in the command line.
  • πŸ’‘Experiment with different `--provider` options (e.g., `openai` vs. `google`) to compare output styles and find the best fit for your specific artistic or technical requirements.
  • πŸ’‘Always specify `--ar` (aspect ratio) and `--quality` parameters to tailor the output precisely for your target platform or use case, whether it's a social media banner or a high-resolution print asset.

What this skill does

  • β€’Generate images via CLI using OpenAI or Google APIs
  • β€’Support for custom aspect ratios and high-resolution 2k quality presets
  • β€’Enable image-to-image workflows using reference files with Google multimodal models
  • β€’Batch prompt processing through file inputs for repetitive tasks
  • β€’Flexible output formats including direct file saving and JSON metadata return

When not to use it

  • βœ•Situations requiring fine-grained artistic control like manual brush painting or layers
  • βœ•Applications demanding real-time sub-second visual generation
  • βœ•Complex animation sequences or video generation workflows

Example workflow

  1. Configure API keys in the environment variable file
  2. Prepare a source description or style guide in a text file
  3. Run the generator command specifying the output filename and aspect ratio
  4. Provide reference images if seeking specific visual consistency
  5. Capture the saved file path from the command output for subsequent processing

Prerequisites

  • –Node.js environment
  • –Active OpenAI or Google Cloud API credentials
  • –Bun runtime installed

Pitfalls & limitations

  • !Google multimodal features are not available when using OpenAI providers
  • !Reliance on CLI flags for configuration can become tedious for complex multi-image batches
  • !Default provider auto-selection might override specific model preferences if not explicitly flagged

FAQ

Can I use both OpenAI and Google models interchangeably?
Yes, you can toggle between providers using the --provider flag, provided you have configured the corresponding API keys.
How does the tool handle reference images?
Reference images are supported exclusively through Google's multimodal models, allowing you to pass one or more files to influence the output style.
Is there a way to get output metadata instead of just the image?
Yes, adding the --json flag to your command will return structured metadata regarding the generation instead of just the file path.
What happens if I don't specify a provider?
The script defaults to Google if available, or follows the priority defined in your environment variable configuration.

How it compares

Unlike manual web-based generation, this skill integrates directly into your local shell, allowing for scripted, repeatable, and programmatic image creation that scales with your code.

Source & trust

⭐ 22k starsπŸ•’ Updated 2026-06-13
πŸ“„ Full skill instructions β€” original source: jimliu/baoyu-skills
# Image Generation (AI SDK)

Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).

## Script Directory

**Important**: All scripts are located in the scripts/ subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as SKILL_DIR
2. Script path = ${SKILL_DIR}/scripts/<script-name>.ts
3. Replace all ${SKILL_DIR} in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| scripts/main.ts | CLI entry point for image generation |

## Quick Start

# Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9

# High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k

# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai

# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png


## Commands

### Basic Image Generation

# Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png

# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png


### Aspect Ratios

# Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4

# Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024


### Reference Images (Google Multimodal)

# Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png

# Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png


### Quality Presets

# Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal

# High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k


### Output Formats

# Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json


## Options

| Option | Description |
|--------|-------------|
| --prompt <text>, -p | Prompt text |
| --promptfiles <files...> | Read prompt from files (concatenated) |
| --image <path> | Output image path (required) |
| --provider google\|openai | Force provider (default: google) |
| --model <id>, -m | Model ID |
| --ar <ratio> | Aspect ratio (e.g., 16:9, 1:1, 4:3) |
| --size <WxH> | Size (e.g., 1024x1024) |
| --quality normal\|2k | Quality preset (default: normal) |
| --ref <files...> | Reference images (Google multimodal only) |
| --n <count> | Number of images |
| --json | JSON output |
| --help, -h | Show help |

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| OPENAI_API_KEY | OpenAI API key | - |
| GOOGLE_API_KEY | Google API key | - |
| OPENAI_IMAGE_MODEL | OpenAI model | gpt-image-1.5 |
| GOOGLE_IMAGE_MODEL | Google model | gemini-3-pro-image-preview |
| OPENAI_BASE_URL | Custom OpenAI endpoint | - |
| GOOGLE_BASE_URL | Custom Google endpoint | - |

**Load Priority**: CLI args > process.env > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env

## Provider & Model Strategy

### Auto-Selection

1. If --provider specified β†’ use it
2. If only one API key available β†’ use that provider
3. If both available β†’ default to Google (multimodal LLMs more versatile)

### API Selection by Model Type

| Model Category | API Function | Example Models |
|----------------|--------------|----------------|
| Google Multimodal | generateText | gemini-2.0-flash-exp-image-generation |
| Google Imagen | experimental_generateImage | imagen-3.0-generate-002 |
| OpenAI | experimental_generateImage | gpt-image-1, dall-e-3 |

### Available Models

**Google**:
- gemini-3-pro-image-preview - Default, multimodal generation
- gemini-2.0-flash-exp-image-generation - Gemini 2.0 Flash
- imagen-3.0-generate-002 - Imagen 3

**OpenAI**:
- gpt-image-1.5 - Default, GPT Image 1.5
- gpt-image-1 - GPT Image 1
- dall-e-3 - DALL-E 3

## Quality Presets

| Preset | OpenAI | Google | Use Case |
|--------|--------|--------|----------|
| normal | 1024x1024 | Default | Covers, illustrations |
| 2k | 2048x2048 | "2048px" in prompt | Infographics, slides |

## Aspect Ratio Handling

- **Multimodal LLMs**: Embedded in prompt (e.g., "... aspect ratio 16:9")
- **Image-only models**: Uses aspectRatio or size parameter
- **Common ratios**: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1

## Examples

### Generate Cover Image

npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "A minimalist tech illustration with blue gradients" \
--image cover.png --ar 2.35:1 --quality 2k


### Generate Social Media Post

npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Instagram post about coffee" \
--image post.png --ar 1:1


### Edit Image with Reference

npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Change the background to sunset" \
--image edited.png --ref original.png --provider google


### Batch Generation from Prompt File

# Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--promptfiles style-guide.md scene-description.md \
--image scene.png


## Error Handling

- **Missing API key**: Clear error with setup instructions
- **Generation failure**: Auto-retry once, then error
- **Invalid aspect ratio**: Warning, proceed with default
- **Reference images with image-only model**: Warning, ignore refs

## Extension Support

Custom configurations via EXTEND.md.

**Check paths** (priority order):
1. .baoyu-skills/baoyu-image-gen/EXTEND.md (project)
2. ~/.baoyu-skills/baoyu-image-gen/EXTEND.md (user)

If found, load before workflow. Extension content overrides defaults.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/baoyu-image-gen/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/jimliu/baoyu-skills/baoyu-image-gen/SKILL.md
  • Cursor: ~/.cursor/skills/jimliu/baoyu-skills/baoyu-image-gen/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/jimliu/baoyu-skills/baoyu-image-gen/SKILL.md

πŸš€ Install with CLI:
npx skills add jimliu/baoyu-skills

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid creative & visual issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under Creative & Visual and is published by Jim Liu, maintained in jimliu/baoyu-skills.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.