Install
openclaw skills install microsoft-markitdownUse MarkItDown to convert various files (PDF, Word, Excel, PPT, images, audio, HTML, CSV, JSON, etc.) to Markdown format for LLM processing and text analysis. Also supports content extraction from ZIP archives, YouTube videos, and EPUB e-books.
openclaw skills install microsoft-markitdownA Python tool for converting various file types to Markdown format.
Use this skill when users mention scenarios like "convert file to Markdown," "extract document content," "file to Markdown," or "extract text from PDF/Word/Excel."
pip install 'markitdown[all]'
Optional dependency group installations (to save space):
| Tag | Description |
|---|---|
[pptx] | PowerPoint files |
[docx] | Word files |
[xlsx] | Excel files |
[xls] | Legacy Excel files |
[pdf] | PDF files |
[outlook] | Outlook emails |
[az-doc-intel] | Azure Document Intelligence |
[audio-transcription] | Audio transcription (wav/mp3) |
[youtube-transcription] | YouTube video transcription |
# Basic usage: output to stdout
markitdown path-to-file.pdf > document.md
# Specify output file
markitdown path-to-file.pdf -o document.md
# Pipe input
cat path-to-file.pdf | markitdown
# Use Azure Document Intelligence
markitdown path-to-file.pdf -o document.md -d -e "<document_intelligence_endpoint>"
# List installed plugins
markitdown --list-plugins
# Enable plugins
markitdown --use-plugins path-to-file.pdf
from markitdown import MarkItDown
md = MarkItDown(enable_plugins=False) # True to enable plugins
result = md.convert("test.xlsx")
print(result.text_content)
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o", llm_prompt="optional custom prompt")
result = md.convert("example.jpg")
print(result.text_content)
from markitdown import MarkItDown
md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
result = md.convert("test.pdf")
print(result.text_content)
from markitdown import MarkItDown
from openai import OpenAI
md = MarkItDown(
enable_plugins=True,
llm_client=OpenAI(),
llm_model="gpt-4o",
)
result = md.convert("document_with_images.pdf")
print(result.text_content)
| Format | Extensions | Dependency Tag |
|---|---|---|
[pdf] | ||
| PowerPoint | .pptx | [pptx] |
| Word | .docx | [docx] |
| Excel | .xlsx / .xls | [xlsx] / [xls] |
| Images | .jpg/.png etc. | Built-in (OCR requires [audio-transcription] + LLM) |
| Audio | .mp3/.wav | [audio-transcription] |
| HTML | .html | Built-in |
| CSV/JSON/XML | .csv/.json/.xml | Built-in |
| ZIP | .zip | Built-in (iterates contents) |
| YouTube | URL | [youtube-transcription] |
| EPUB | .epub | Built-in |
| Outlook | .msg | [outlook] |
When a user requests file conversion:
markitdown <file> -o <output.md>