Install
openclaw skills install @harrylabsj/contract-clause-extractorExtract & classify key clauses from contract PDFs into a structured risk summary — with bilingual (CN/EN) support.
openclaw skills install @harrylabsj/contract-clause-extractorTurn dense contract PDFs into structured, scannable clause summaries with risk ratings. Extract key clauses across 12 standard categories, flag hidden risks, compare multiple contracts side-by-side, and generate bilingual clause translations — all without replacing legal counsel.
Input: User uploads contract PDF/DOCX, provides URL, or pastes text. Supports single or multiple files for comparison mode. Action: Identify document structure — page layout, clause numbering pattern (1.1 / Article 1 / 第一条), table presence, signature blocks. Output: Parsed document with structural metadata. If scanned/image PDF, trigger OCR pipeline. Logic: Auto-detect language (Chinese, English, or mixed). Handle password-protected PDFs by requesting password.
Input: Parsed document. Action: Segment by clause boundaries using numbering patterns, heading styles, and semantic breaks. Preserve parent-child hierarchy for nested clauses. Output: Indexed clause list with numbering + raw text + parent reference. Logic: Handle non-standard numbering (Chinese legal: 一、/(一)/ 1. / (1)). Handle cross-page clause splits.
Input: Segmented clauses. Action: Classify each clause into one of 12 standard categories using LLM semantic matching:
Output: Clauses grouped by category with confidence scores.
Input: Classified clauses. Action: Score each clause on risk level:
Output: Each clause tagged with risk level + brief explanation of why.
Input: Entire contract + risk-annotated clauses. Action: Pattern-based scanning for structural risks:
Output: "Hidden Risk Alerts" section with specific clause references and severity rating.
Input: Risk-annotated clauses. Action: Generate a structured extraction table:
| # | Clause Category | Original Text (excerpt) | Summary | Risk | Modification Suggestion |
|---|---|---|---|---|---|
| 1 | Payment | "乙方应在收到发票后90日内付款" | 90-day payment term | 🟡 | Negotiate to 30 days standard |
| 2 | Liability | "赔偿上限为合同金额的1倍" | Liability cap = 1× contract value | 🟢 | Standard protection |
Output: Complete extraction table. Option to export as CSV/XLSX.
Input: 2+ contracts with their extraction tables. Action: Align clauses by category, then:
Output: Side-by-side comparison table with diff highlights.
Input: Extraction table + target language selection. Action: Translate clause summaries and key terms while preserving legal terminology consistency. Build an ad-hoc bilingual term glossary for the document. Output: Bilingual extraction table (Original → Summary in Target Language). Key terms glossary.
Input: All analysis results. Action: Compile into a comprehensive extraction report:
Output: Complete extraction report.
User: "帮我快速提取这份合同的关键条款,标出风险点 [upload: supply-agreement.pdf]" Expected Output:
Executive Summary: Supply Agreement | Parties: Company A vs Company B | Term: 1 year | Overall Risk: 🟡 Medium
Clause Extraction (18 clauses, 12 categories):
🔴 High Risk (2):
- Indemnity: "甲方承担一切赔偿责任" — Unlimited indemnity, one-sided
- Termination: "乙方可随时终止合同" — Unilateral termination without cause
🟡 Medium Risk (5):
- Payment: Net-90 terms, market standard is Net-30
- Force Majeure: Overly broad definition includes "market conditions"
🟢 Low Risk (11): Standard commercial terms
⚠️ Hidden Risk Alert: No confidentiality clause for Party A (imbalanced)
Top 3 Modification Priorities: 1. Cap indemnity 2. Add mutual termination 3. Shorten payment to Net-30
User: "对比这两份合同的关键差异 [upload: contract-v1.pdf, contract-v2.pdf]" Expected Output: Side-by-side comparison table with 7 categories showing differences, highlighting where v2 is more/less favorable than v1, with a "verdict" column indicating which version is preferred per category.
User: "这份30页的服务合同我不敢签,帮我找找有没有坑 [upload: service-agreement.docx]" Expected Output: Hidden risk report focused on 6 structural risk patterns, each with: the offending clause text, why it's problematic, and suggested alternative wording.
User: "提取这份中文合同的核心条款,翻译成英文给海外法务看 [upload: nda-zh.pdf]" Expected Output: Bilingual table with Chinese original + English summary for key clauses. Glossary: 保密信息→Confidential Information, 接收方→Receiving Party, etc. Flag terms where translation may create ambiguity.
User: "检查这份合同是否缺少了标准商业合同应该有的条款 [upload: vendor-contract.pdf]" Expected Output: Checklist of 12 standard clause categories with ✓/✗ status. For missing categories, explain the risk of omission and suggest a model clause.
User: "明天要和供应商谈合同,帮我准备谈判要点 [upload: draft-contract.docx]" Expected Output: Prioritized negotiation playbook: Tier 1 (non-negotiable risks → must fix), Tier 2 (market-standard adjustments → push for), Tier 3 (nice-to-have → concede gracefully), with talking points for each.
Scenario: Early-stage startup receives a 15-page SaaS vendor agreement. No in-house legal. Input: Upload PDF of vendor contract. Concern: "作为小公司,会不会被大厂合同坑?" Steps:
Scenario: Job seeker receives offer + employment contract. Wants to understand restrictions. Input: "帮我看看这份劳动合同,重点看竞业限制和知识产权条款 [upload: employment-contract.pdf]" Steps:
Scenario: User about to sign a 24-month commercial lease. Input: "租办公室的合同,帮我提取关键信息 [upload: lease-agreement.pdf]" Steps:
contract-clause-extractor.sh classify contract.pdf — parses and extracts all clauses into 12 categoriescontract-clause-extractor.sh risk contract.pdf — annotates each clause with 🔴/🟡/🟢 risk levelscontract-clause-extractor.sh summarize contract.pdf — see the structured extraction table with modification suggestions| Condition | Behavior |
|---|---|
| Contract >100 pages | Process in chunks; summarize by chapter, flag time estimate |
| Scanned/image PDF (no text layer) | Trigger OCR; warn of possible extraction errors |
| Password-protected PDF | Request password; never attempt to crack |
| Non-contract document uploaded | Detect and warn: "This does not appear to be a legal contract" |
| Contract in unsupported language | Attempt processing; flag lower confidence for non-CN/EN languages |
| Handwritten annotations in PDF | Flag as "may contain markings" — OCR may miss handwritten text |
| Corrupted/unreadable PDF | Error with suggested fixes (re-export, convert format) |
| Multiple unrelated contracts in one PDF | Auto-detect and offer to process separately |
| User asks for legal advice | Redirect: "This is clause extraction + risk flagging, not legal advice. Consult a qualified lawyer." |
| Error Code | Scenario | Handling |
|---|---|---|
| E-PARSE-FAIL | PDF structure cannot be parsed | Offer manual text input; suggest re-exporting PDF from source |
| E-OCR-FAIL | OCR on scanned document fails | Return images with note; suggest higher-quality scan |
| E-PASSWORD | Password-protected PDF without password | Prompt for password; never attempt brute-force |
| E-NO-CLAUSES | Document has no detectable clause structure | Process as paragraph-level; flag as "unstructured document" |
| E-UNSUPPORTED-FORMAT | Uploaded file is not PDF/DOCX/TXT | List supported formats; suggest conversion |
| E-AMBIGUOUS-CLASSIFICATION | Clause spans multiple categories | Tag with multiple categories; flag for human review |
| E-BILINGUAL-CONFIDENCE | Low confidence on legal term translation | Mark with ⚠️ "Translation may need legal review" |