baoyu-url-to-markdown
Install this skill
npx skills add jimliu/baoyu-skillsWorks across Claude Code, Cursor, Codex, Copilot & Antigravity
The baoyu-url-to-markdown skill provides a specialized utility for extracting web content into formatted Markdown using Chrome CDP for full JavaScript rendering. Instead of traditional static scrapers, this tool handles dynamic pages that rely on client-side execution to display information. It features two operating modes: an automatic capture that processes pages upon loading, and a manual wait mode that pauses the script to allow for user interactions like logging in or dismissing popups. Files are automatically organized into a domain-based folder structure with smart naming conventions and timestamp-based collision handling. By maintaining consistent metadata headers and stripping unnecessary elements, the skill produces clean documentation archives directly from live URLs, suitable for technical research or local knowledge base population.
When to Use This Skill
- β’Archiving technical documentation from sites requiring login
- β’Converting complex web articles into readable offline Markdown
- β’Capturing dynamic web pages that fail with standard curl or wget
- β’Creating personal knowledge base entries from live research URLs
How to Invoke This Skill
Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:
- βconvert this webpage to markdown
- βscrape this url and save it to my docs
- βcapture this article as a markdown file
- βfetch the content from this link for my notes
- βsave this dynamic webpage into a markdown file
Pro Tips
- π‘For complex pages or those requiring authentication, always use the `--wait` flag to manually signal when the page is ready for capture, ensuring all dynamic elements are loaded.
- π‘Leverage the `-o` option to specify an output file path, which is crucial when integrating this skill into automated workflows or scripting environments.
- π‘Remember that Chrome CDP renders the page fully, so the markdown output will reflect the page's appearance after all scripts have executed, providing a more accurate representation than simple HTML parsers.
What this skill does
- β’Executes full JavaScript rendering via Chrome CDP
- β’Supports interactive wait-mode for authenticated sessions
- β’Generates standardized frontmatter metadata for every page
- β’Organizes outputs into domain-specific directory structures
- β’Implements automatic conflict resolution with timestamp suffixes
When not to use it
- βScraping websites with aggressive anti-bot or CAPTCHA challenges
- βExtracting data from internal web applications that require specific SSO/MFA workflows
Example workflow
- Identify target URL and determine if authentication is required
- Execute the main script via CLI using the appropriate mode
- If in wait-mode, perform manual login or navigation in the browser
- Confirm page readiness to finalize the document capture
- Verify the output in the automatically generated domain directory
Prerequisites
- βGoogle Chrome or Chromium installed on the system
- βBun runtime environment
- βAppropriate file system permissions for the output directory
Pitfalls & limitations
- !May struggle with sites that detect automated Chrome sessions
- !Default timeout settings might be insufficient for slow-loading heavy pages
- !Requires manual intervention for pages with complex multi-step login flows
FAQ
How it compares
Unlike standard text-to-markdown prompts which rely on the LLM's cached knowledge or simple HTML parsing, this tool uses actual browser rendering to ensure the captured DOM matches what a user sees on screen.
π Full skill instructions β original source: jimliu/baoyu-skills
Fetches any URL via Chrome CDP and converts HTML to clean markdown.
## Script Directory
**Important**: All scripts are located in the
scripts/ subdirectory of this skill.**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as
SKILL_DIR2. Script path =
${SKILL_DIR}/scripts/<script-name>.ts3. Replace all
${SKILL_DIR} in this document with the actual path**Script Reference**:
| Script | Purpose |
|--------|---------|
|
scripts/main.ts | CLI entry point for URL fetching |## Features
- Chrome CDP for full JavaScript rendering
- Two capture modes: auto or wait-for-user
- Clean markdown output with metadata
- Handles login-required pages via wait mode
## Usage
# Auto mode (default) - capture when page loads
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>
# Wait mode - wait for user signal before capture
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait
# Save to specific file
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md## Options
| Option | Description |
|--------|-------------|
|
<url> | URL to fetch ||
-o <path> | Output file path (default: auto-generated) ||
--wait | Wait for user signal before capturing ||
--timeout <ms> | Page load timeout (default: 30000) |## Capture Modes
### Auto Mode (default)
Page loads β waits for network idle β captures immediately.
Best for:
- Public pages
- Static content
- No login required
### Wait Mode (
--wait)Page opens β user can interact (login, scroll, etc.) β user signals ready β captures.
Best for:
- Login-required pages
- Dynamic content needing interaction
- Pages with lazy loading
**Agent workflow for wait mode**:
1. Run script with
--wait flag2. Script outputs:
Page opened. Press Enter when ready to capture...3. Use
AskUserQuestion to ask user if page is ready4. When user confirms, send newline to stdin to trigger capture
## Output Format
---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---
# Page Title
Converted markdown content...## Mode Selection Guide
When user requests URL capture, help select appropriate mode:
**Suggest Auto Mode when**:
- URL is public (no login wall visible)
- Content appears static
- User doesn't mention login requirements
**Suggest Wait Mode when**:
- User mentions needing to log in
- Site known to require authentication
- User wants to scroll/interact before capture
- Content is behind paywall
**Ask user when unclear**:
The page may require login or interaction before capturing.
Which mode should I use?
1. Auto - Capture immediately when loaded
2. Wait - Wait for you to interact first## Output Directory
Each capture creates a file organized by domain:
url-to-markdown/
βββ <domain>/
βββ <slug>.md**Path Components**:
-
<domain>: Site domain (e.g., example.com, github.com)-
<slug>: Generated from page title or URL path (kebab-case)**Slug Generation**:
1. Extract from page title (preferred) or URL path
2. Convert to kebab-case, 2-6 words
3. Example: "Getting Started with React" β
getting-started-with-react**Conflict Resolution**:
If
url-to-markdown/<domain>/<slug>.md already exists:- Append timestamp:
<slug>-YYYYMMDD-HHMMSS.md- Example:
getting-started.md exists β getting-started-20260118-143052.md## Error Handling
| Error | Resolution |
|-------|------------|
| Chrome not found | Install Chrome or set
URL_CHROME_PATH env || Page timeout | Increase
--timeout value || Capture failed | Try wait mode for complex pages |
| Empty content | Page may need JS rendering time |
## Environment Variables
| Variable | Description |
|----------|-------------|
|
URL_CHROME_PATH | Custom Chrome executable path ||
URL_DATA_DIR | Custom data directory ||
URL_CHROME_PROFILE_DIR | Custom Chrome profile directory |## Extension Support
Custom configurations via EXTEND.md.
**Check paths** (priority order):
1.
.baoyu-skills/baoyu-url-to-markdown/EXTEND.md (project)2.
~/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md (user)If found, load before workflow. Extension content overrides defaults.
How to Use This Skill Unit
Option A: Project-Specific (Recommended)
- Click "Download" above
- In your project, create the directory:
.agent/skills/baoyu-url-to-markdown/ - Save the file as
SKILL.md - The agent will automatically discover the skill based on its description.
Option B: Global Installation (All Agents)
Save the file to these locations to make it available across all projects:
- Claude Code:
~/.claude/skills/jimliu/baoyu-skills/baoyu-url-to-markdown/SKILL.md - Cursor:
~/.cursor/skills/jimliu/baoyu-skills/baoyu-url-to-markdown/SKILL.md - Antigravity:
~/.gemini/antigravity/skills/jimliu/baoyu-skills/baoyu-url-to-markdown/SKILL.md
π Install with CLI:npx skills add jimliu/baoyu-skills