Microsoft MarkItDown

v1.0.0

Convert various document formats (PDF, Word, PowerPoint, Excel, images, audio, HTML, etc.) to Markdown using Microsoft's markitdown tool. Supports OCR, audio...

0· 118·0 current·0 all-time
by_silhouette@lanyasheng

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lanyasheng/ms-markitdown.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Microsoft MarkItDown" (lanyasheng/ms-markitdown) from ClawHub.
Skill page: https://clawhub.ai/lanyasheng/ms-markitdown
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ms-markitdown

ClawHub CLI

Package manager switcher

npx clawhub@latest install ms-markitdown
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the content: SKILL.md documents markitdown usage and the included script wraps the markitdown CLI for single and batch conversions. Requested resources and code are proportionate to a converter helper.
Instruction Scope
SKILL.md focuses on installing and using markitdown and shows CLI/Python API examples. The included wrapper only invokes the external markitdown CLI and reads/writes files; it does not access unrelated system files or environment variables. Note: the included test uses a hard-coded user path (/Users/study/.openclaw/skills/markitdown) which is an odd, environment-specific assertion but not indicative of exfiltration.
Install Mechanism
This is an instruction-only skill (no install spec). SKILL.md recommends installing markitdown from PyPI/pipx which is a standard distribution channel; the bundle does not download arbitrary archives or run remote installers itself.
Credentials
The skill declares no required environment variables or credentials. The wrapper calls an external CLI (markitdown) but does not request secrets or unrelated tokens.
Persistence & Privilege
Skill is not always-enabled and does not modify other skills or system-wide settings. It simply provides a wrapper script invoked on demand.
Assessment
This skill is a thin wrapper around the third‑party 'markitdown' tool and is internally consistent. Before installing: (1) confirm you trust the markitdown PyPI/GitHub project (pipx is recommended to isolate installation); (2) review markitdown's own docs for any network behavior (YouTube download, remote transcription, MCP server) because those operations may fetch or upload content; (3) be cautious when converting sensitive documents—the conversion tool will read the content and may fetch remote resources if given URLs; (4) the included smoke test contains a hard-coded user path which is probably an autogenerated/benign artifact but may fail on different machines. If you need guarantees about data locality or network access, inspect the upstream markitdown source (links are in SKILL.md) before use.

Like a lobster shell, security has layers — review code before you run it.

conversionvk977sc3a0pd4z46g137nw6cwzn84pkmrdocumentvk977sc3a0pd4z46g137nw6cwzn84pkmrlatestvk977sc3a0pd4z46g137nw6cwzn84pkmrmarkdownvk977sc3a0pd4z46g137nw6cwzn84pkmrmicrosoftvk977sc3a0pd4z46g137nw6cwzn84pkmrpdfvk977sc3a0pd4z46g137nw6cwzn84pkmrwordvk977sc3a0pd4z46g137nw6cwzn84pkmr
118downloads
0stars
1versions
Updated 2w ago
v1.0.0
MIT-0

MarkItDown

Microsoft's lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.

Installation

Prerequisites

  • Python 3.10+
  • Java 11+ (for some converters)

Install via pipx (recommended)

pipx install 'markitdown[all]'

Install via pip

pip install 'markitdown[all]'

Minimal install (specific formats only)

pip install 'markitdown[pdf,docx,pptx]'

Supported Formats

FormatExtensionNotes
PDF.pdfPreserves structure, tables, links
Word.docxHeadings, lists, tables
PowerPoint.pptxSlides to Markdown
Excel.xlsx, .xlsTable data
Images.png, .jpg, etc.EXIF metadata + OCR
Audio.wav, .mp3Speech transcription
HTML.html, .htmWeb content
YouTubeURLVideo transcription
ZIP.zipIterates over contents
EPub.epubE-books
Text.csv, .json, .xmlText-based formats

CLI Usage

Basic Conversion

# PDF to Markdown
markitdown document.pdf > output.md

# Word to Markdown
markitdown document.docx -o output.md

# PowerPoint to Markdown
markitdown presentation.pptx -o output.md

Pipe Input

cat document.pdf | markitdown

Image OCR

markitdown screenshot.png -o text.md

YouTube Video

markitdown "https://youtube.com/watch?v=..." -o transcript.md

Python API Usage

from markitdown import MarkItDown

# Initialize
md = MarkItDown()

# Convert file
result = md.convert("document.pdf")
print(result.text_content)

# Convert from stream
with open("document.pdf", "rb") as f:
    result = md.convert_stream(f)
    print(result.text_content)

Options

OptionDescriptionExample
-o, --outputOutput file-o output.md
--formatOutput format (default: markdown)--format json
--pagesSpecific pages--pages "1,3,5-7"
--image-outputImage handling--image-output external
--quietSuppress output--quiet

MCP Server

MarkItDown provides an MCP (Model Context Protocol) server for integration with LLM applications:

pip install markitdown-mcp

Best Practices

  1. Batch processing: Process multiple files in one call for efficiency
  2. Format selection: Use minimal install if only specific formats needed
  3. OCR quality: Ensure 300 DPI+ for scanned documents
  4. Output review: Always verify Markdown output for complex documents

Troubleshooting

Java not found

Install Java 11+:

# macOS
brew install openjdk@17

# Ubuntu
sudo apt install openjdk-17-jdk

Permission denied

Use pipx or virtual environment:

python3 -m venv ~/.venvs/markitdown
source ~/.venvs/markitdown/bin/activate
pip install 'markitdown[all]'

References

Comments

Loading comments...