Back to Documentation & Writing

baoyu-url-to-markdown

markdownweb scrapingChrome CDPURL conversioncontent extractiondocumentationAI agentautomation
⭐ 21.7kπŸ•’ 2026-06-13Source β†—

Install this skill

npx skills add jimliu/baoyu-skills

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The baoyu-url-to-markdown skill provides a specialized utility for extracting web content into formatted Markdown using Chrome CDP for full JavaScript rendering. Instead of traditional static scrapers, this tool handles dynamic pages that rely on client-side execution to display information. It features two operating modes: an automatic capture that processes pages upon loading, and a manual wait mode that pauses the script to allow for user interactions like logging in or dismissing popups. Files are automatically organized into a domain-based folder structure with smart naming conventions and timestamp-based collision handling. By maintaining consistent metadata headers and stripping unnecessary elements, the skill produces clean documentation archives directly from live URLs, suitable for technical research or local knowledge base population.

When to Use This Skill

  • β€’Archiving technical documentation from sites requiring login
  • β€’Converting complex web articles into readable offline Markdown
  • β€’Capturing dynamic web pages that fail with standard curl or wget
  • β€’Creating personal knowledge base entries from live research URLs

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œconvert this webpage to markdown
  • β€œscrape this url and save it to my docs
  • β€œcapture this article as a markdown file
  • β€œfetch the content from this link for my notes
  • β€œsave this dynamic webpage into a markdown file

Pro Tips

  • πŸ’‘For complex pages or those requiring authentication, always use the `--wait` flag to manually signal when the page is ready for capture, ensuring all dynamic elements are loaded.
  • πŸ’‘Leverage the `-o` option to specify an output file path, which is crucial when integrating this skill into automated workflows or scripting environments.
  • πŸ’‘Remember that Chrome CDP renders the page fully, so the markdown output will reflect the page's appearance after all scripts have executed, providing a more accurate representation than simple HTML parsers.

What this skill does

  • β€’Executes full JavaScript rendering via Chrome CDP
  • β€’Supports interactive wait-mode for authenticated sessions
  • β€’Generates standardized frontmatter metadata for every page
  • β€’Organizes outputs into domain-specific directory structures
  • β€’Implements automatic conflict resolution with timestamp suffixes

When not to use it

  • βœ•Scraping websites with aggressive anti-bot or CAPTCHA challenges
  • βœ•Extracting data from internal web applications that require specific SSO/MFA workflows

Example workflow

  1. Identify target URL and determine if authentication is required
  2. Execute the main script via CLI using the appropriate mode
  3. If in wait-mode, perform manual login or navigation in the browser
  4. Confirm page readiness to finalize the document capture
  5. Verify the output in the automatically generated domain directory

Prerequisites

  • –Google Chrome or Chromium installed on the system
  • –Bun runtime environment
  • –Appropriate file system permissions for the output directory

Pitfalls & limitations

  • !May struggle with sites that detect automated Chrome sessions
  • !Default timeout settings might be insufficient for slow-loading heavy pages
  • !Requires manual intervention for pages with complex multi-step login flows

FAQ

How do I handle pages that require a login?
Use the --wait flag. This allows you to log into the page manually in the browser before signaling the script to perform the final capture.
Where are my saved files stored?
Files are stored in a folder structure organized by domain, specifically url-to-markdown/<domain>/.
What happens if a filename already exists?
The script automatically appends a timestamp to the filename to prevent overwriting existing data.
Can I customize the chrome path?
Yes, set the URL_CHROME_PATH environment variable to point to your specific browser executable.

How it compares

Unlike standard text-to-markdown prompts which rely on the LLM's cached knowledge or simple HTML parsing, this tool uses actual browser rendering to ensure the captured DOM matches what a user sees on screen.

Source & trust

⭐ 22k starsπŸ•’ Updated 2026-06-13
πŸ“„ Full skill instructions β€” original source: jimliu/baoyu-skills
# URL to Markdown

Fetches any URL via Chrome CDP and converts HTML to clean markdown.

## Script Directory

**Important**: All scripts are located in the scripts/ subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as SKILL_DIR
2. Script path = ${SKILL_DIR}/scripts/<script-name>.ts
3. Replace all ${SKILL_DIR} in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| scripts/main.ts | CLI entry point for URL fetching |

## Features

- Chrome CDP for full JavaScript rendering
- Two capture modes: auto or wait-for-user
- Clean markdown output with metadata
- Handles login-required pages via wait mode

## Usage

# Auto mode (default) - capture when page loads
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>

# Wait mode - wait for user signal before capture
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait

# Save to specific file
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md


## Options

| Option | Description |
|--------|-------------|
| <url> | URL to fetch |
| -o <path> | Output file path (default: auto-generated) |
| --wait | Wait for user signal before capturing |
| --timeout <ms> | Page load timeout (default: 30000) |

## Capture Modes

### Auto Mode (default)

Page loads β†’ waits for network idle β†’ captures immediately.

Best for:
- Public pages
- Static content
- No login required

### Wait Mode (--wait)

Page opens β†’ user can interact (login, scroll, etc.) β†’ user signals ready β†’ captures.

Best for:
- Login-required pages
- Dynamic content needing interaction
- Pages with lazy loading

**Agent workflow for wait mode**:
1. Run script with --wait flag
2. Script outputs: Page opened. Press Enter when ready to capture...
3. Use AskUserQuestion to ask user if page is ready
4. When user confirms, send newline to stdin to trigger capture

## Output Format

---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---

# Page Title

Converted markdown content...


## Mode Selection Guide

When user requests URL capture, help select appropriate mode:

**Suggest Auto Mode when**:
- URL is public (no login wall visible)
- Content appears static
- User doesn't mention login requirements

**Suggest Wait Mode when**:
- User mentions needing to log in
- Site known to require authentication
- User wants to scroll/interact before capture
- Content is behind paywall

**Ask user when unclear**:
The page may require login or interaction before capturing.

Which mode should I use?
1. Auto - Capture immediately when loaded
2. Wait - Wait for you to interact first


## Output Directory

Each capture creates a file organized by domain:

url-to-markdown/
└── <domain>/
└── <slug>.md


**Path Components**:
- <domain>: Site domain (e.g., example.com, github.com)
- <slug>: Generated from page title or URL path (kebab-case)

**Slug Generation**:
1. Extract from page title (preferred) or URL path
2. Convert to kebab-case, 2-6 words
3. Example: "Getting Started with React" β†’ getting-started-with-react

**Conflict Resolution**:
If url-to-markdown/<domain>/<slug>.md already exists:
- Append timestamp: <slug>-YYYYMMDD-HHMMSS.md
- Example: getting-started.md exists β†’ getting-started-20260118-143052.md

## Error Handling

| Error | Resolution |
|-------|------------|
| Chrome not found | Install Chrome or set URL_CHROME_PATH env |
| Page timeout | Increase --timeout value |
| Capture failed | Try wait mode for complex pages |
| Empty content | Page may need JS rendering time |

## Environment Variables

| Variable | Description |
|----------|-------------|
| URL_CHROME_PATH | Custom Chrome executable path |
| URL_DATA_DIR | Custom data directory |
| URL_CHROME_PROFILE_DIR | Custom Chrome profile directory |

## Extension Support

Custom configurations via EXTEND.md.

**Check paths** (priority order):
1. .baoyu-skills/baoyu-url-to-markdown/EXTEND.md (project)
2. ~/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md (user)

If found, load before workflow. Extension content overrides defaults.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/baoyu-url-to-markdown/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/jimliu/baoyu-skills/baoyu-url-to-markdown/SKILL.md
  • Cursor: ~/.cursor/skills/jimliu/baoyu-skills/baoyu-url-to-markdown/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/jimliu/baoyu-skills/baoyu-url-to-markdown/SKILL.md

πŸš€ Install with CLI:
npx skills add jimliu/baoyu-skills

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid documentation & writing issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under Documentation & Writing and is published by Jim Liu, maintained in jimliu/baoyu-skills.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.