Install
openclaw skills install @harrylabsj/document-translation-assistantTranslate technical & legal documents while preserving original formatting, terminology consistency, and domain context.
openclaw skills install @harrylabsj/document-translation-assistantTranslate documents without breaking them. Preserve markdown structure, code blocks, tables, links, and image references while maintaining terminology consistency across the entire document — purpose-built for technical and legal content where accuracy matters more than fluency.
Input: User provides:
tech | legal | marketing | generalbilingual (side-by-side) | translated-only | bothOutput: Parsed document with structure tree (headings, paragraphs, code blocks, tables, lists, links, images). Logic: Auto-detect language with confidence score. If confidence <90%, confirm with user.
Input: Parsed document. Action: Build a structure tree identifying translatable vs non-translatable nodes:
| Node Type | Translatable | Example |
|---|---|---|
| Headings | ✅ Translate | ## Getting Started → ## 快速开始 |
| Paragraph text | ✅ Translate | Body text, descriptions |
| List items | ✅ Translate | Bullet points, numbered lists |
| Table cell text | ✅ Translate | Cell content (not table structure) |
| Code blocks | ❌ Preserve | All code, commands, config |
| Inline code | ❌ Preserve | npm install, const x = 1 |
| Links | ❌ Preserve URLs | [text](url) — translate text, keep URL |
| Images | ❌ Preserve |  — translate alt text, keep URL |
| Frontmatter | ⚠️ Selective | Translate description, keep slug/tags |
| HTML tags | ❌ Preserve | <div>, <span> — translate text content only |
| Placeholders | ❌ Preserve | {{variable}}, %s, {0} |
Output: Structure tree with translatable segments marked for processing.
Input: Translatable segments + domain mode. Action: Extract domain-specific terminology:
Output: Candidate term list with occurrence count and context snippets.
Input: Candidate terms + user input. Action: Present extracted terms to user for confirmation:
Extracted 23 specialized terms:
| # | Source Term | Occurrences | Suggested Translation | Your Translation |
|---|------------|-------------|----------------------|-------------------|
| 1 | 微服务 | 15 | microservices | [confirm/edit] |
| 2 | 熔断器 | 8 | circuit breaker | [confirm/edit] |
| 3 | 服务降级 | 5 | service degradation | [confirm/edit] |
User can:
Output: Finalized terminology glossary. Saved per project for reuse. Logic: Terms with multiple possible translations flag for user review. Common terms with single standard translation auto-apply.
Input: Translatable segments + terminology glossary + domain mode. Action: Translate each segment with:
**bold**, *italic*, `code`{{name}}, %d, positional argumentsOutput: Translated segments with format-preservation validation. Logic: Process in chunks of ~5 segments to maintain local context. Large documents processed in batches with progress indicator.
Input: All translated segments + glossary. Action: Post-translation scan:
Output: Consistency report:
✅ Terminology check: 23/23 terms consistent
⚠️ Inconsistency found: "负载均衡" → "load balancing" (18 occurrences)
→ "load balancer" (2 occurrences — FIXED)
Input: Translated full document. Action: Read the entire translated document to fix context breaks:
Output: Coherence-adjusted translation. Logic: AI reads full translated document as a human editor would, flagging sections that read disjointed.
Input: Coherent translated document. Action: Generate output in requested format(s):
Bilingual mode (default for review):
## Getting Started | ## 快速开始
This guide will help you set up the project. | 本指南将帮助您搭建项目。
1. Clone the repository | 1. 克隆仓库
`git clone https://...` | `git clone https://...`
Translated-only mode (for publication): The full document in target language, preserving all formatting.
Output: File(s) saved in same or specified directory.
Input: Finalized glossary + project identifier. Action: Save terminology glossary for future reuse:
{
"project": "user-service-docs",
"domain": "tech",
"source_lang": "zh",
"target_lang": "en",
"terms": {
"微服务": "microservices",
"服务网格": "service mesh",
"熔断器": "circuit breaker"
},
"updated": "2026-06-17"
}
Output: Saved glossary. Next translation for this project auto-loads it. Logic: Glossary stored locally. User controls save/delete/export.
User: "帮我把这个中文README翻译成英文 [upload: README_zh.md]" Expected Output: Bilingual view with all code blocks, commands, and links preserved. Terminology glossary auto-extracted (微服务→microservices, 部署→deploy) and applied consistently.
User: "这份中文合同需要翻译成英文给海外同事看,保持法律术语准确 [upload: contract_zh.docx]" Expected Output: Translated DOCX with formal legal tone. Glossary terms: 甲方→Party A, 违约责任→Breach of Contract, 不可抗力→Force Majeure. Warning: "This is a translation for reference. Not a certified legal translation."
User: "我们的产品文档需要中英双语版本,以后每次更新都要同步翻译 [path: ~/docs/]" Expected Output: All markdown files in the docs/ directory translated with glossaries saved. Future runs: detect changed files only, translate diffs, maintain consistency.
User: "之前翻译的文档里同一个术语翻了好几种,帮我统一 [upload: translated_zh.md]" Expected Output: Scan for inconsistent terms → present conflict list → user chooses preferred translation → apply uniformly. Report: "Fixed 14 inconsistencies across 'API网关' (was: api-gateway, API Gateway, ApiGateway → now: API Gateway)."
User: "翻译后帮我检查文档格式有没有被破坏 [upload: translated.md + original.md]" Expected Output: Structure diff: heading count, code block count, link count, table row count. "All structural elements preserved (42 headings, 7 code blocks, 12 links, 3 tables). ✅"
User: "这个开源项目的README需要翻译成中文、日文、韩文 [upload: README.md]" Expected Output: Three translated files (README_zh.md, README_ja.md, README_ko.md) with per-language glossaries. Note: "Japanese and Korean translations may have lower confidence—review recommended."
Scenario: Maintainer wants to make a Chinese open source project accessible internationally. Input: "帮我翻译整个项目的文档:README, CONTRIBUTING, 和 docs/ 下面的所有文件" Steps:
.translation-glossary.json in project root.Scenario: Chinese company shares NDA with US partner. Both sides need to understand the content. Input: "翻译这份保密协议,法律术语要准确,格式不能乱 [upload: nda_zh.docx]" Steps:
Scenario: Engineering team updates docs weekly; translations lag behind. Input: "我们每周更新中文文档,每次帮我翻译新增和修改的部分 [path: ~/docs/]" Steps:
~/docs/.translation-glossary.json.translation-assistant.sh parse README_zh.md — parses document, detects structure and languagetranslation-assistant.sh glossary README_zh.md — extracts and reviews terminology glossarytranslation-assistant.sh translate README_zh.md en --domain tech — generates translated document with format preservation| Condition | Behavior |
|---|---|
| Document >50K words | Process in batches with progress indicator; warn of ~X minutes processing time |
| Scanned/image PDF (no text layer) | Trigger OCR; warn of lower translation quality due to OCR errors |
| Code-heavy document (>50% code) | Skip code blocks; translate only comments and prose; note low text ratio |
| Mixed language document | Detect primary language; flag mixed sections for special handling |
| Document with embedded JSON/YAML | Preserve JSON/YAML structure; translate only string values if user requests |
| Unsupported output format requested | Offer conversion path (e.g., "Can output MD, DOCX, HTML. For PDF, convert after.") |
| Zero domain terms detected | Skip glossary step; proceed with direct translation in general mode |
| Target language is same as source | Warn: "Source and target are the same language. Continue with proofreading mode?" |
| Error Code | Scenario | Handling |
|---|---|---|
| E-PARSE-FAIL | Document structure unparseable | Offer plain text fallback (loses formatting); warn user |
| E-FORMAT-CORRUPT | Post-translation format validation fails | Show diff of structural elements; offer manual fix or revert |
| E-GLOSSARY-CONFLICT | Same term has conflicting translations in saved glossary | Present conflict; ask user to choose or merge |
| E-OCR-FAIL | OCR on scanned document produces garbage | Return original images + note; suggest better scan quality |
| E-ENCODING | Document has non-standard encoding | Auto-detect and convert to UTF-8; warn if characters lost |
| E-TERM-OVERLAP | Domain term is also a common word (e.g., "bug" in tech) | Flag as ambiguous; ask user to clarify context or accept heuristic |
| E-TRANSLATION-LOW-CONFIDENCE | AI translation confidence below threshold for a segment | Mark segment with ⚠️ "Low confidence" annotation in output |