baoyu-image-gen
Install this skill
npx skills add jimliu/baoyu-skillsWorks across Claude Code, Cursor, Codex, Copilot & Antigravity
The baoyu-image-gen skill enables direct image generation through your terminal using the AI SDK. It bridges the gap between CLI commands and sophisticated visual models from OpenAI and Google. By automating the interaction with providers like DALL-E, GPT Image, and Gemini/Imagen, it allows developers to script visual assets directly into their build or data-processing pipelines. Users can define parameters like aspect ratios, quality presets, and custom output filenames without manual web-interface interaction. It excels in environments where image creation needs to be repeatable, such as generating batches of icons, UI mockups from text descriptions, or integrating multi-modal analysis workflows that involve reference images for style consistency. The tool manages provider selection automatically based on your available API keys, simplifying the setup for multi-model visual workflows.
When to Use This Skill
- β’Automating the generation of social media assets from text-based content files
- β’Creating style-consistent UI mockups by using existing design references
- β’Integrating image synthesis into automated build or testing pipelines
- β’Batch generating custom placeholder imagery for rapid prototyping
How to Invoke This Skill
Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:
- βgenerate a 16:9 hero image for my website
- βcreate a high-quality icon using my reference image
- βrun a batch image generation from my system prompts
- βuse google gemini to synthesize this image based on my ref file
- βsave a 2k resolution landscape image as landscape.png
Pro Tips
- π‘Leverage prompt files (`--promptfiles`) for intricate descriptions, allowing you to manage complex prompts and iterate on them easily, rather than typing long prompts directly in the command line.
- π‘Experiment with different `--provider` options (e.g., `openai` vs. `google`) to compare output styles and find the best fit for your specific artistic or technical requirements.
- π‘Always specify `--ar` (aspect ratio) and `--quality` parameters to tailor the output precisely for your target platform or use case, whether it's a social media banner or a high-resolution print asset.
What this skill does
- β’Generate images via CLI using OpenAI or Google APIs
- β’Support for custom aspect ratios and high-resolution 2k quality presets
- β’Enable image-to-image workflows using reference files with Google multimodal models
- β’Batch prompt processing through file inputs for repetitive tasks
- β’Flexible output formats including direct file saving and JSON metadata return
When not to use it
- βSituations requiring fine-grained artistic control like manual brush painting or layers
- βApplications demanding real-time sub-second visual generation
- βComplex animation sequences or video generation workflows
Example workflow
- Configure API keys in the environment variable file
- Prepare a source description or style guide in a text file
- Run the generator command specifying the output filename and aspect ratio
- Provide reference images if seeking specific visual consistency
- Capture the saved file path from the command output for subsequent processing
Prerequisites
- βNode.js environment
- βActive OpenAI or Google Cloud API credentials
- βBun runtime installed
Pitfalls & limitations
- !Google multimodal features are not available when using OpenAI providers
- !Reliance on CLI flags for configuration can become tedious for complex multi-image batches
- !Default provider auto-selection might override specific model preferences if not explicitly flagged
FAQ
How it compares
Unlike manual web-based generation, this skill integrates directly into your local shell, allowing for scripted, repeatable, and programmatic image creation that scales with your code.
π Full skill instructions β original source: jimliu/baoyu-skills
Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).
## Script Directory
**Important**: All scripts are located in the
scripts/ subdirectory of this skill.**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as
SKILL_DIR2. Script path =
${SKILL_DIR}/scripts/<script-name>.ts3. Replace all
${SKILL_DIR} in this document with the actual path**Script Reference**:
| Script | Purpose |
|--------|---------|
|
scripts/main.ts | CLI entry point for image generation |## Quick Start
# Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9
# High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai
# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png## Commands
### Basic Image Generation
# Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png
# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png### Aspect Ratios
# Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4
# Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024### Reference Images (Google Multimodal)
# Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png
# Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png### Quality Presets
# Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal
# High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k### Output Formats
# Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json## Options
| Option | Description |
|--------|-------------|
|
--prompt <text>, -p | Prompt text ||
--promptfiles <files...> | Read prompt from files (concatenated) ||
--image <path> | Output image path (required) ||
--provider google\|openai | Force provider (default: google) ||
--model <id>, -m | Model ID ||
--ar <ratio> | Aspect ratio (e.g., 16:9, 1:1, 4:3) ||
--size <WxH> | Size (e.g., 1024x1024) ||
--quality normal\|2k | Quality preset (default: normal) ||
--ref <files...> | Reference images (Google multimodal only) ||
--n <count> | Number of images ||
--json | JSON output ||
--help, -h | Show help |## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
|
OPENAI_API_KEY | OpenAI API key | - ||
GOOGLE_API_KEY | Google API key | - ||
OPENAI_IMAGE_MODEL | OpenAI model | gpt-image-1.5 ||
GOOGLE_IMAGE_MODEL | Google model | gemini-3-pro-image-preview ||
OPENAI_BASE_URL | Custom OpenAI endpoint | - ||
GOOGLE_BASE_URL | Custom Google endpoint | - |**Load Priority**: CLI args >
process.env > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env## Provider & Model Strategy
### Auto-Selection
1. If
--provider specified β use it2. If only one API key available β use that provider
3. If both available β default to Google (multimodal LLMs more versatile)
### API Selection by Model Type
| Model Category | API Function | Example Models |
|----------------|--------------|----------------|
| Google Multimodal |
generateText | gemini-2.0-flash-exp-image-generation || Google Imagen |
experimental_generateImage | imagen-3.0-generate-002 || OpenAI |
experimental_generateImage | gpt-image-1, dall-e-3 |### Available Models
**Google**:
-
gemini-3-pro-image-preview - Default, multimodal generation-
gemini-2.0-flash-exp-image-generation - Gemini 2.0 Flash-
imagen-3.0-generate-002 - Imagen 3**OpenAI**:
-
gpt-image-1.5 - Default, GPT Image 1.5-
gpt-image-1 - GPT Image 1-
dall-e-3 - DALL-E 3## Quality Presets
| Preset | OpenAI | Google | Use Case |
|--------|--------|--------|----------|
|
normal | 1024x1024 | Default | Covers, illustrations ||
2k | 2048x2048 | "2048px" in prompt | Infographics, slides |## Aspect Ratio Handling
- **Multimodal LLMs**: Embedded in prompt (e.g.,
"... aspect ratio 16:9")- **Image-only models**: Uses
aspectRatio or size parameter- **Common ratios**: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
## Examples
### Generate Cover Image
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "A minimalist tech illustration with blue gradients" \
--image cover.png --ar 2.35:1 --quality 2k### Generate Social Media Post
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Instagram post about coffee" \
--image post.png --ar 1:1### Edit Image with Reference
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Change the background to sunset" \
--image edited.png --ref original.png --provider google### Batch Generation from Prompt File
# Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--promptfiles style-guide.md scene-description.md \
--image scene.png## Error Handling
- **Missing API key**: Clear error with setup instructions
- **Generation failure**: Auto-retry once, then error
- **Invalid aspect ratio**: Warning, proceed with default
- **Reference images with image-only model**: Warning, ignore refs
## Extension Support
Custom configurations via EXTEND.md.
**Check paths** (priority order):
1.
.baoyu-skills/baoyu-image-gen/EXTEND.md (project)2.
~/.baoyu-skills/baoyu-image-gen/EXTEND.md (user)If found, load before workflow. Extension content overrides defaults.
How to Use This Skill Unit
Option A: Project-Specific (Recommended)
- Click "Download" above
- In your project, create the directory:
.agent/skills/baoyu-image-gen/ - Save the file as
SKILL.md - The agent will automatically discover the skill based on its description.
Option B: Global Installation (All Agents)
Save the file to these locations to make it available across all projects:
- Claude Code:
~/.claude/skills/jimliu/baoyu-skills/baoyu-image-gen/SKILL.md - Cursor:
~/.cursor/skills/jimliu/baoyu-skills/baoyu-image-gen/SKILL.md - Antigravity:
~/.gemini/antigravity/skills/jimliu/baoyu-skills/baoyu-image-gen/SKILL.md
π Install with CLI:npx skills add jimliu/baoyu-skills