Back to Documentation & Writing

web-to-markdown

markdownweb contentweb scrapingdocumentationclipuppeteerreadabilitycontent extraction
2.0k📄 MIT🕒 2026-03-05Source ↗

Install this skill

npx skills add softaworks/agent-toolkit

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The web-to-markdown skill transforms dynamic web pages into structured, readable Markdown documents. By driving a local Chromium-family browser via the web2md command-line utility, it addresses the challenge of scraping content from JavaScript-rendered websites. It uses Readability for content extraction and Turndown for conversion, ensuring that the resulting files are clean and suitable for documentation repositories. Users can handle complex scenarios, such as authentication-gated content or single-page applications, by managing browser sessions or applying timing delays. This tool is specific to developers who maintain local browser environments and require automated, high-fidelity conversions of online resources into version-controlled formats. All requests must explicitly cite the skill name, ensuring the agent remains in full control of local browser execution.

When to Use This Skill

  • Archiving technical documentation from sites requiring JS execution
  • Converting research articles to Markdown for local knowledge base storage
  • Automating the collection of multiple reports into a version-controlled folder
  • Capturing page content that is only accessible after manual authentication

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • use the skill web-to-markdown to grab this article
  • can you use a skill web-to-markdown to convert the page at this URL?
  • please use the skill web-to-markdown for the following list of links
  • use the skill web-to-markdown to save this site as a markdown file

Pro Tips

  • 💡Specify `--chrome-path` if you have multiple Chrome installations or a non-standard path to ensure `puppeteer-core` targets the correct browser.
  • 💡For bulk conversions, utilize a script to feed multiple URLs to the skill, specifying an output directory (`--out ./some-dir/`) for automatic file naming.
  • 💡Always verify the Markdown output, especially for very complex or interactive pages, and be prepared to manually tweak if Readability misidentifies the main content.

What this skill does

  • Converts JS-rendered web pages to clean Markdown
  • Supports interactive sessions for handling login walls
  • Provides granular control over browser wait times and selectors
  • Generates output to individual files or directory-based batches
  • Cleans links and formats content using Readability and Turndown

When not to use it

  • Scraping static websites where simple curl or wget operations suffice
  • Performing large-scale production web crawling without explicit user interaction

Example workflow

  1. Verify the user explicitly invoked the skill by name
  2. Validate the target URL format
  3. Check for local installation of web2md using command -v
  4. Execute the conversion with appropriate flags like --wait-until or --interactive
  5. Confirm file creation by checking the target directory path
  6. Return the location of the generated files to the user

Prerequisites

  • Local installation of a Chromium-family browser
  • Installed and configured web2md utility
  • Node.js environment for package management

Pitfalls & limitations

  • !Requires manual interaction for login walls via the --interactive flag
  • !Failure to use the exact trigger phrase causes the agent to stop
  • !Chromium auto-detection might fail on custom system paths

FAQ

What happens if I don't use the specific trigger phrase?
The agent will refuse to execute the task and ask you to re-submit your request using the exact phrase 'use the skill web-to-markdown'.
Can I use this for sites that require logging in?
Yes, use the --interactive flag to launch a headful browser session, complete your login manually, and then press Enter in your terminal to continue the conversion.
Which browsers are supported?
The tool works best with Chromium-family browsers such as Chrome, Chromium, Brave, or Edge via puppeteer-core.

How it compares

Unlike generic web scrapers or simple browser-less requests, this tool uses a live browser engine to handle JavaScript execution, ensuring accurate content capture for complex web applications.

Source & trust

2.0k stars📄 MIT🕒 Updated 2026-03-05
📄 Full skill instructions — original source: softaworks/agent-toolkit
# web-to-markdown

Convert web pages to clean Markdown by driving a locally installed browser (via web2md).

## Hard trigger gate (must enforce)

This skill MUST NOT be used unless the user explicitly wrote **exactly** a phrase like:
- use the skill web-to-markdown ...
- use a skill web-to-markdown ...

If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including: use the skill web-to-markdown.

## What this skill does

- Handles JS-rendered pages (Puppeteer → user Chrome).
- Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via puppeteer-core.
- Extracts main content (Readability).
- Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter.

## Non-goals

- Do not use Playwright or other browser automation stacks; the mechanism is web2md.

## Inputs you should collect (ask only if missing)

- url (or a list of URLs)
- Output preference:
- Print to stdout (--print), OR
- Save to a file (--out ./file.md), OR
- Save to a directory (--out ./some-dir/ to auto-name by page title)
- Optional rendering controls for tricky pages:
- --chrome-path <path> (if Chrome auto-detection fails)
- --interactive (show Chrome and pause so the user can complete human checks/login, then press Enter)
- --wait-until load|domcontentloaded|networkidle0|networkidle2
- --wait-for '<css selector>'
- --wait-ms <milliseconds>
- --headful (debug)
- --no-sandbox (sometimes required in containers/CI)
- --user-data-dir <dir> (login/session; use a dedicated profile directory)

## Workflow

1) Confirm the user explicitly invoked the skill (use the skill web-to-markdown).
2) Validate URL(s) start with http:// or https://.
3) Ensure web2md is installed:
- Run: command -v web2md
- If missing, instruct the user to install it (assume the project exists at ~/workspace/softaworks/projects/web2md):
- cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link
- Or: cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm install -g .
4) Convert:
- Single URL → file:
- web2md '<url>' --out ./page.md
- Single URL → auto-named file in directory:
- mkdir -p ./out && web2md '<url>' --out ./out/
- Human verification / login walls (interactive):
- mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/
- Then: complete the check in the browser window and press Enter in the terminal to continue.
- Print to stdout:
- web2md '<url>' --print
- Multiple URLs (batch):
- Create output dir (e.g. ./out/) then run one web2md command per URL using --out ./out/
5) Validate output:
- If writing files, verify they exist and are non-empty (e.g. ls -la <path> and wc -c <path>).
6) Return:
- The saved file path(s), or the Markdown (stdout mode).

## Defaults (recommended)

- For most pages: --wait-until networkidle2
- For heavy apps: start with --wait-until domcontentloaded --wait-ms 2000, then add --wait-for 'main' (or another stable selector) if needed.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/web-to-markdown/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/softaworks/agent-toolkit/web-to-markdown/SKILL.md
  • Cursor: ~/.cursor/skills/softaworks/agent-toolkit/web-to-markdown/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/softaworks/agent-toolkit/web-to-markdown/SKILL.md

🚀 Install with CLI:
npx skills add softaworks/agent-toolkit

Read the Master Guide: Mastering Agent Skills

Related Skill Units

Recommended Rules

View more rules

Recommended Workflows

View more workflows

Recommended MCP Servers

View more MCP servers

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid documentation & writing issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under Documentation & Writing and is published by Softaworks, maintained in softaworks/agent-toolkit.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.