Back to Creative & Visual

baoyu-danger-gemini-web

geminigoogle geminiimage generationtext generationaigenerative aicreative contentcoding assistantagent skill
21.7k🕒 2026-06-13Source ↗

Install this skill

npx skills add jimliu/baoyu-skills

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The baoyu-danger-gemini-web skill provides a command-line interface for interacting with the Gemini Web service using reverse-engineered methods. Because this client relies on undocumented web protocols rather than official Google cloud APIs, it requires a mandatory user consent protocol before initialization. It supports standard text-based LLM queries, multi-turn conversation threads managed via unique session IDs, and visual inputs. You can ingest local image files as context or generate new visual assets directly from the terminal. The tool operates as a TypeScript environment via the Bun runtime. Users must manage the mandatory consent JSON file manually to ensure the tool remains active. It acts as an unofficial automation bridge for local automation workflows requiring Gemini's visual and textual reasoning capabilities.

When to Use This Skill

  • Automating analysis of local screenshots or diagrams using Gemini's vision engine
  • Creating persistent AI chat history for terminal-based scripting tasks
  • Batch processing text prompts using local source files as context
  • Generating descriptive assets based on existing local images

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • Analyze this image using Gemini
  • Start a new Gemini chat session
  • Generate a variation of this local image
  • Summarize these local text files with Gemini
  • Run a persistent conversation with Gemini

Pro Tips

  • 💡Leverage the `--sessionId` parameter for multi-turn conversations to maintain context and refine generated outputs iteratively.
  • 💡Utilize the reference images feature to provide vision input, guiding Gemini's generation with specific visual styles or examples.
  • 💡Integrate this skill as a powerful image generation backend for other creative skills like 'cover-image' or 'article-illustrator' for seamless workflows.
  • 💡Always perform the required consent check to ensure proper authorization before skill execution.

What this skill does

  • Text-based prompt completion using current Gemini model variants
  • Multi-turn dialogue persistence through session ID tagging
  • Image-to-text vision analysis by attaching local reference files
  • Direct image generation output saved to local disk
  • Support for reading prompt inputs from multiple local text files

When not to use it

  • Production-critical workflows requiring official SLA and Google API stability
  • Applications that demand high-security handling of sensitive data

Example workflow

  1. Verify the presence of consent.json in the appropriate user data directory
  2. Initiate a text query with a unique --sessionId to track the thread
  3. Attach a local reference image for the model to analyze
  4. Execute the prompt to retrieve text or generate a new image file
  5. Save the output image locally and clear the session state if finished

Prerequisites

  • Bun runtime installed on the local system
  • Consent file initialized in the user application support path

Pitfalls & limitations

  • !Google may change web protocols without warning, causing the skill to break
  • !No official support exists for authentication or rate-limit handling
  • !Requires manual maintenance of the consent.json file

FAQ

Why is a consent check mandatory?
Because this tool uses a reverse-engineered web client rather than an official API, you must explicitly acknowledge the lack of official support and inherent instability.
Can I use this for enterprise production?
No, this is an unofficial tool and is not suitable for environments that require uptime guarantees, official security compliance, or long-term API stability.
How do I clear my chat history?
Simply change or omit the --sessionId flag in your command to initiate a fresh context thread.
Does it support all Gemini models?
It supports specific models defined in the internal configuration, such as gemini-3-pro and gemini-2.5 variants.

How it compares

Unlike using a browser-based chat interface, this skill allows for programmatic automation, pipe-based data flow, and direct integration with local shell workflows.

Source & trust

22k stars🕒 Updated 2026-06-13
📄 Full skill instructions — original source: jimliu/baoyu-skills
# Gemini Web Client

Supports:
- Text generation
- Image generation (download + save)
- Reference images for vision input (attach local images)
- Multi-turn conversations via persisted --sessionId

## Script Directory

**Important**: All scripts are located in the scripts/ subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as SKILL_DIR
2. Script path = ${SKILL_DIR}/scripts/<script-name>.ts
3. Replace all ${SKILL_DIR} in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| scripts/main.ts | CLI entry point for text/image generation |
| scripts/gemini-webapi/* | TypeScript port of gemini_webapi (GeminiClient, types, utils) |

## ⚠️ Disclaimer (REQUIRED)

**Before using this skill**, the consent check MUST be performed.

### Consent Check Flow

**Step 1**: Check consent file

# macOS
cat ~/Library/Application\ Support/baoyu-skills/gemini-web/consent.json 2>/dev/null

# Linux
cat ~/.local/share/baoyu-skills/gemini-web/consent.json 2>/dev/null

# Windows (PowerShell)
Get-Content "$env:APPDATA\baoyu-skills\gemini-web\consent.json" 2>$null


**Step 2**: If consent exists and accepted: true with matching disclaimerVersion: "1.0":

Print warning and proceed:
⚠️  Warning: Using reverse-engineered Gemini Web API (not official). Accepted on: <acceptedAt date>


**Step 3**: If consent file doesn't exist or disclaimerVersion mismatch:

Display disclaimer and ask user:

⚠️  DISCLAIMER

This tool uses a reverse-engineered Gemini Web API, NOT an official Google API.

Risks:
- May break without notice if Google changes their API
- No official support or guarantees
- Use at your own risk

Do you accept these terms and wish to continue?


Use AskUserQuestion tool with options:
- **Yes, I accept** - Continue and save consent
- **No, I decline** - Exit immediately

**Step 4**: On acceptance, create consent file:

# macOS
mkdir -p ~/Library/Application\ Support/baoyu-skills/gemini-web
cat > ~/Library/Application\ Support/baoyu-skills/gemini-web/consent.json << 'EOF'
{
"version": 1,
"accepted": true,
"acceptedAt": "<ISO timestamp>",
"disclaimerVersion": "1.0"
}
EOF

# Linux
mkdir -p ~/.local/share/baoyu-skills/gemini-web
cat > ~/.local/share/baoyu-skills/gemini-web/consent.json << 'EOF'
{
"version": 1,
"accepted": true,
"acceptedAt": "<ISO timestamp>",
"disclaimerVersion": "1.0"
}
EOF


**Step 5**: On decline, output message and stop:
User declined the disclaimer. Exiting.


---

## Quick start

npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello, Gemini"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Explain quantum computing"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute cat" --image cat.png
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# Multi-turn conversation (agent generates unique sessionId)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Remember this: 42" --sessionId my-unique-id-123
npx -y bun ${SKILL_DIR}/scripts/main.ts "What number?" --sessionId my-unique-id-123


## Commands

### Text generation

# Simple prompt (positional)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Your prompt here"

# Explicit prompt flag
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Your prompt here"
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "Your prompt here"

# With model selection
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "Hello" -m gemini-2.5-pro

# Pipe from stdin
echo "Summarize this" | npx -y bun ${SKILL_DIR}/scripts/main.ts


### Image generation

# Generate image with default path (./generated.png)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image

# Generate image with custom path
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute robot" --image robot.png

# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts "A dragon" --image=dragon.png


### Vision input (reference images)

# Text + image -> text
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Describe this image" --reference a.png

# Text + image -> image
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Generate a variation" --reference a.png --image out.png


### Output formats

# Plain text (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello"

# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json


## Options

| Option | Description |
|--------|-------------|
| --prompt <text>, -p | Prompt text |
| --promptfiles <files...> | Read prompt from files (concatenated in order) |
| --model <id>, -m | Model: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash |
| --image [path] | Generate image, save to path (default: generated.png) |
| --reference <files...>, --ref <files...> | Reference images for vision input |
| --sessionId <id> | Session ID for multi-turn conversation (agent generates unique ID) |
| --list-sessions | List saved sessions (max 100, sorted by update time) |
| --json | Output as JSON |
| --login | Refresh cookies only, then exit |
| --cookie-path <path> | Custom cookie file path |
| --profile-dir <path> | Chrome profile directory |
| --help, -h | Show help |

CLI note: scripts/main.ts supports text generation, image generation, reference images (--reference/--ref), and multi-turn conversations via --sessionId.

## Models

- gemini-3-pro - Default, latest model
- gemini-2.5-pro - Previous generation pro
- gemini-2.5-flash - Fast, lightweight

## Authentication

First run opens a browser to authenticate with Google. Cookies are cached for subsequent runs.

**Supported browsers** (auto-detected in order):
- Google Chrome
- Google Chrome Canary / Beta
- Chromium
- Microsoft Edge

Override with GEMINI_WEB_CHROME_PATH environment variable if needed.

# Force cookie refresh
npx -y bun ${SKILL_DIR}/scripts/main.ts --login


## Environment variables

| Variable | Description |
|----------|-------------|
| GEMINI_WEB_DATA_DIR | Data directory |
| GEMINI_WEB_COOKIE_PATH | Cookie file path |
| GEMINI_WEB_CHROME_PROFILE_DIR | Chrome profile directory |
| GEMINI_WEB_CHROME_PATH | Chrome executable path |

## Proxy Configuration

If you need a proxy to access Google services (e.g., in China), set HTTP_PROXY and HTTPS_PROXY environment variables before running:

# Example with local proxy
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890 npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello"

# Image generation with proxy
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890 npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# Cookie refresh with proxy
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890 npx -y bun ${SKILL_DIR}/scripts/main.ts --login


**Note**: Environment variables must be set inline with the command. Shell profile settings (e.g., .bashrc) may not be inherited by subprocesses.

## Examples

### Generate text response
npx -y bun ${SKILL_DIR}/scripts/main.ts "What is the capital of France?"


### Generate image
npx -y bun ${SKILL_DIR}/scripts/main.ts "A photorealistic image of a golden retriever puppy" --image puppy.png


### Get JSON output for parsing
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json | jq '.text'


### Generate image from prompt files
# Concatenate system.md + content.md as prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image output.png


### Multi-turn conversation
# Start a session with unique ID (agent generates this)
npx -y bun ${SKILL_DIR}/scripts/main.ts "You are a helpful math tutor." --sessionId task-abc123

# Continue the conversation (remembers context)
npx -y bun ${SKILL_DIR}/scripts/main.ts "What is 2+2?" --sessionId task-abc123
npx -y bun ${SKILL_DIR}/scripts/main.ts "Now multiply that by 10" --sessionId task-abc123

# List recent sessions (max 100, sorted by update time)
npx -y bun ${SKILL_DIR}/scripts/main.ts --list-sessions


Session files are stored in ~/Library/Application Support/baoyu-skills/gemini-web/sessions/<id>.json and contain:
- id: Session ID
- metadata: Gemini chat metadata for continuation
- messages: Array of {role, content, timestamp, error?}
- createdAt, updatedAt: Timestamps

## Extension Support

Custom configurations via EXTEND.md.

**Check paths** (priority order):
1. .baoyu-skills/baoyu-danger-gemini-web/EXTEND.md (project)
2. ~/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md (user)

If found, load before workflow. Extension content overrides defaults.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/baoyu-danger-gemini-web/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/jimliu/baoyu-skills/baoyu-danger-gemini-web/SKILL.md
  • Cursor: ~/.cursor/skills/jimliu/baoyu-skills/baoyu-danger-gemini-web/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/jimliu/baoyu-skills/baoyu-danger-gemini-web/SKILL.md

🚀 Install with CLI:
npx skills add jimliu/baoyu-skills

Read the Master Guide: Mastering Agent Skills

Recommended Rules

View more rules

Recommended Workflows

View more workflows

Recommended MCP Servers

View more MCP servers

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid creative & visual issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under Creative & Visual and is published by Jim Liu, maintained in jimliu/baoyu-skills.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.