Back to AI Tools & Agents

markdown-converter

markdowndocument conversionpdf to markdowndata extractionllm preprocessingfile parsertext analysisai tools
273📄 CC0-1.0🕒 2026-04-25Source ↗

Install this skill

npx skills add intellectronica/agent-skills

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The markdown-converter skill transforms a wide array of file formats into structured Markdown syntax using the markitdown utility. It acts as a bridge between proprietary document formats and text-based formats suitable for large language model context windows or static site generation. By executing directly via uvx, it avoids local dependency management issues while maintaining support for complex objects like tables, lists, and headings. This tool extracts content from office suites, scanned imagery via OCR, and streaming media like YouTube. It also integrates with Azure Document Intelligence for high-fidelity extraction of complex PDF layouts. Whether processing local archives or piping data from command-line streams, the tool standardizes disparate data sources into a uniform, readable format, simplifying downstream data processing and documentation tasks without manual copying or reformatting.

When to Use This Skill

  • Ingesting legacy Excel or Word documents into a documentation repository
  • Converting scanned research papers into searchable Markdown text files
  • Extracting metadata and transcriptions from video or audio assets for indexing
  • Normalizing various data formats like JSON or XML for easier textual comparison
  • Preparing large sets of heterogeneous files for RAG system ingestion

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • Convert this PDF into markdown for me
  • Transform the report.docx file to a text-based format
  • Extract the data from this spreadsheet into a markdown table
  • Create a markdown summary of this document file
  • Run a conversion on this slide deck to get the text out

Pro Tips

  • 💡For conversions from `stdin` where the file type isn't obvious (e.g., piped content), always use `-x` or `-m` options to hint the extension or MIME type for more accurate parsing.
  • 💡Combine this skill with a text summarization agent to automatically distill converted documents, providing concise outputs from complex source material.
  • 💡When dealing with scanned documents or challenging PDFs, leverage the `-d` flag to utilize Azure Document Intelligence for superior OCR and layout understanding, significantly improving Markdown quality.

What this skill does

  • Transforms office suites including Word, Excel, and PowerPoint into Markdown
  • Performs OCR on image files to extract embedded text
  • Extracts textual content from YouTube URLs and audio files
  • Supports piping inputs from stdin for automated processing pipelines
  • Integrates with Azure Document Intelligence for sophisticated PDF table and structure recovery

When not to use it

  • When you need to preserve original visual styling, fonts, or exact layout positioning
  • When dealing with encrypted or password-protected document files

Example workflow

  1. Locate the source document file in your working directory
  2. Execute the conversion command pointing to the file path
  3. Specify an output destination to save the generated markdown file
  4. Review the resulting file to ensure table and list formatting is accurate
  5. Refine the output by adding manual corrections if complex visuals were present

Prerequisites

  • uv installed on the system
  • Azure Document Intelligence endpoint (optional for advanced PDFs)

Pitfalls & limitations

  • !PDFs with complex multi-column layouts may produce disorganized text without the Azure plugin
  • !Large media files can result in long processing times during the transcription phase
  • !Non-textual elements in spreadsheets may be lost during the conversion process

FAQ

Does this require me to install extra packages?
No, it uses uvx to run the utility on-demand, which manages environment isolation automatically.
Can I convert multiple files at once?
The base tool processes files individually, though you can loop through directory contents using shell scripts.
How does this handle images inside documents?
It processes embedded images and utilizes OCR to capture text that would otherwise be inaccessible in standard copy-paste operations.
Is the output quality consistent?
It is highly reliable for standard text, but complex layouts or heavy graphic designs may require manual review after conversion.

How it compares

Unlike manual copy-pasting which often breaks table alignment and list structures, this utility enforces programmatic parsing that maintains the logical hierarchy and semantic integrity of the source document.

Source & trust

273 stars📄 CC0-1.0🕒 Updated 2026-04-25
📄 Full skill instructions — original source: intellectronica/agent-skills
# Markdown Converter

Convert files to Markdown using uvx markitdown — no installation required.

## Basic Usage

# Convert to stdout
uvx markitdown input.pdf

# Save to file
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md

# From stdin
cat input.pdf | uvx markitdown


## Supported Formats

- **Documents**: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- **Web/Data**: HTML, CSV, JSON, XML
- **Media**: Images (EXIF + OCR), Audio (EXIF + transcription)
- **Other**: ZIP (iterates contents), YouTube URLs, EPub

## Options

-o OUTPUT      # Output file
-x EXTENSION # Hint file extension (for stdin)
-m MIME_TYPE # Hint MIME type
-c CHARSET # Hint charset (e.g., UTF-8)
-d # Use Azure Document Intelligence
-e ENDPOINT # Document Intelligence endpoint
--use-plugins # Enable 3rd-party plugins
--list-plugins # Show installed plugins


## Examples

# Convert Word document
uvx markitdown report.docx -o report.md

# Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md

# Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md

# Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md

# Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"


## Notes

- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use -d with Azure Document Intelligence

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/markdown-converter/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/intellectronica/agent-skills/markdown-converter/SKILL.md
  • Cursor: ~/.cursor/skills/intellectronica/agent-skills/markdown-converter/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/intellectronica/agent-skills/markdown-converter/SKILL.md

🚀 Install with CLI:
npx skills add intellectronica/agent-skills

Read the Master Guide: Mastering Agent Skills

Recommended Rules

View more rules

Recommended Workflows

View more workflows

Recommended MCP Servers

View more MCP servers

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid ai tools & agents issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under AI Tools & Agents and is published by intellectronica, maintained in intellectronica/agent-skills.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.