Install
openclaw skills install @mjk39966-glitch/mjk39966-document-parserParse and extract content from .docx, .pdf, and .txt documents. Extracts plain text and tables for analysis. Use when the user uploads a document file or asks to analyze/extract/read content from Word documents, PDFs, or text files. Also use when the user asks questions about document content that requires parsing first.
openclaw skills install @mjk39966-glitch/mjk39966-document-parserExtract text and tables from documents (.docx, .pdf, .txt) for analysis and question-answering.
Parse a document:
python scripts/parse_document.py /path/to/document.pdf
Output is JSON with extracted text, tables, and metadata.
First use only: Install dependencies by running:
bash scripts/install_dependencies.shscripts\install_dependencies.batThis installs: python-docx, PyPDF2, pdfplumber
| Format | Text | Tables | Notes |
|---|---|---|---|
| .txt | ✅ | ❌ | Direct text extraction |
| .docx | ✅ | ✅ | Paragraphs + structured tables |
| ✅ | ✅ | Page-by-page extraction |
scripts/parse_document.pyUser: "What's the total revenue in quarterly_report.docx?"
Steps:
python scripts/parse_document.py quarterly_report.docxDefault JSON output:
{
"text": "Full document text...",
"tables": [
[["Header 1", "Header 2"], ["Data 1", "Data 2"]]
],
"metadata": {
"format": "pdf",
"pages": 3,
"tables": 1
}
}
Human-readable format (add --format text):
==========================================================
EXTRACTED TEXT:
==========================================================
Document content here...
==========================================================
TABLES FOUND: 2
==========================================================
Table 1:
Name | Age | City
John | 30 | NYC
Jane | 25 | LA
For detailed examples and edge cases, see references/usage_examples.md.
If dependencies are missing, the script returns an error with installation instructions. Run the appropriate install script to resolve.