MarkItDown Skill

OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.

Audits

Pass

Install

openclaw skills install markitdown-skill

MarkItDown Skill

Documentation and utilities for converting documents to Markdown using Microsoft's MarkItDown library.

Note: This skill provides documentation and a batch script. The actual conversion is done by the markitdown CLI/library installed via pip.

When to Use

Use markitdown for:

  • 📄 Fetching documentation (README, API docs)
  • 🌐 Converting web pages to markdown
  • 📝 Document analysis (PDFs, Word, PowerPoint)
  • 🎬 YouTube transcripts
  • 🖼️ Image text extraction (OCR)
  • 🎤 Audio transcription

Quick Start

# Convert file to markdown
markitdown document.pdf -o output.md

# Convert URL
markitdown https://example.com/docs -o docs.md

Supported Formats

FormatFeatures
PDFText extraction, structure
Word (.docx)Headings, lists, tables
PowerPointSlides, text
ExcelTables, sheets
ImagesOCR + EXIF metadata
AudioSpeech transcription
HTMLStructure preservation
YouTubeVideo transcription

Installation

The skill requires Microsoft's markitdown CLI:

pip install 'markitdown[all]'

Or install specific formats only:

pip install 'markitdown[pdf,docx,pptx]'

Common Patterns

Fetch Documentation

markitdown https://github.com/user/repo/blob/main/README.md -o readme.md

Convert PDF

markitdown document.pdf -o document.md

Batch Convert

# Using included script
python ~/.openclaw/skills/markitdown/scripts/batch_convert.py docs/*.pdf -o markdown/ -v

# Or shell loop
for file in docs/*.pdf; do
  markitdown "$file" -o "${file%.pdf}.md"
done

Python API

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)

Troubleshooting

"markitdown not found"

pip install 'markitdown[all]'

OCR Not Working

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

What This Skill Provides

ComponentSource
markitdown CLIMicrosoft's pip package
markitdown Python APIMicrosoft's pip package
scripts/batch_convert.pyThis skill (utility)
DocumentationThis skill

See Also