Install
openclaw skills install @2025biophilia-coder/paper-to-tableExtract structured data from academic papers (PDF/DOCX/TXT) into literature review tables (XLSX/CSV) with fidelity, batch support, and multi-domain handling.
openclaw skills install @2025biophilia-coder/paper-to-tableExtract structured information from academic papers and populate literature review tables.
python scripts/read_table.py <table_path>
输出:列名、数据类型约束、领域推断。
python scripts/extract_paper.py <paper_path> --structured
自动处理格式:PDF→pdfplumber/PyMuPDF/OCR fallback;DOCX→python-docx;TXT→直接读取。
输出结构化 JSON:包含 full_text 和 sections(abstract/introduction/methods/results/discussion/conclusion)。
原则:只提取论文明确陈述的信息,绝不臆造。
输入:表头 + 论文全文/章节
输出格式(每个字段):
{
"FieldName": {
"value": "extracted value 或 N/A",
"confidence": "HIGH/MEDIUM/LOW",
"source": "paper location"
}
}
CRITICAL RULES:
"N/A",不做推断提取优先级:Abstract→Methods→Results→Discussion→补充材料
python scripts/write_table.py <table_path> '<json_data>' --validate
验证内容:JSON格式、键名匹配、无重复条目、数据类型合理。
重复检测:标题相似度>85%视为重复,跳过并警告。
报告:处理论文数、新增行数、跳过数(重复/错误)、LOW confidence 字段、输出路径。
python scripts/batch_process.py <papers_folder> <table_path> [output_folder]
batch_log_YYYYMMDD_HHMMSS.json详细字段定义、提取策略 → references/extraction-patterns.md
质量检查清单 → references/quality-checklist.md