Install
openclaw skills install corporate-doc-builderGenerate polished .docx documents by injecting Markdown content into an existing Word template, preserving the template's cover page, TOC, fonts, headers, and footers. Use when the user has a .docx template plus reference materials (documents, spreadsheets, slides, or source code) and wants a production-ready Word deliverable. Covers the full pipeline from template analysis through chapter drafting to python-docx injection.
openclaw skills install corporate-doc-builderTurning a corporate .docx template plus scattered source materials into a finished document is error-prone. Models routinely break template styles, exceed token limits, hallucinate TOC entries, and produce images that overlap with text.
This skill codifies a battle-tested 6-stage pipeline that avoids these traps:
1. Template Analysis -> Extract TOC, styles, placeholders
2. Spec -> Write a design spec for the document
3. Plan -> Break work into per-chapter tasks (optional for simple docs)
4. Research -> Summarize source materials into reusable notes
5. Authoring -> Write each chapter as an independent Markdown file
6. Injection -> Render diagrams + inject Markdown into the template
Core principle: Template styles are preserved via python-docx copy-and-inject, never via pandoc whole-file conversion. All diagrams use Mermaid.
Activate this skill when ANY of the following apply:
.docx template path AND source/reference materials.docx template + .xlsx feature lists + .pptx architecture diagrams or similar mixed enterprise assetsDo NOT activate when:
.docx (use the docx skill instead)Each stage has a pre-flight checklist and exit criteria. Do not skip stages or run them in parallel.
Goal: Understand the template structure and agree on a TOC mapping with the user.
Companion skill: If
superpowers:brainstormingis available, invoke it at the start of this stage to systematically explore user intent and requirements before committing to a TOC mapping.
| Item | Action |
|---|---|
| Output directory | ls to verify it exists; fix typos before proceeding |
| Source material paths | ls each path to confirm accessibility |
| Template file | ls -la <template>.docx to confirm it exists and is not locked |
| Historical drafts | If prior output exists, read the first chapter to verify it belongs to THIS document |
from docx import Document
doc = Document(template_path)
for p in doc.paragraphs:
if p.style.name.startswith("Heading") or p.style.name.startswith("toc"):
print(p.style.name, p.text)
Some templates use custom styles (e.g., CJ1, CJ2) instead of standard Heading styles. Scan all paragraph styles and identify which ones act as headings.
Templates often contain placeholder images and tables. Extract them to plan which chapters need diagrams or data tables:
# Count images
print(f"Images: {len(doc.inline_shapes)}")
# Count tables
print(f"Tables: {len(doc.tables)}")
Templates often say "keep titles consistent." The real meaning is:
Goal: Write a design spec that anchors all subsequent work.
Write to <output>/spec/<YYYY-MM-DD>-<topic>-spec.md. Include at minimum:
Self-check before submitting for review: scan for leftover placeholders, internal inconsistencies, scope creep, and ambiguity.
Goal: Break the work into per-chapter tasks for complex documents.
Skip this stage for simple documents (fewer than 5 chapters). For larger documents, write to <output>/plans/<YYYY-MM-DD>-<topic>-plan.md with tasks grouped into:
Each task should have bite-sized steps (2-5 minutes each). Per-chapter independent delivery + independent review is the key token budget protection mechanism.
Goal: Extract and summarize source materials into reusable research notes.
Suggested output files in <output>/research/:
| File | Content |
|---|---|
code-architecture.md | Top-level module structure, key packages, tech stack, critical data flows |
reference-docs-summary.md | Heading outline + key table/figure index for each reference document |
feature-mapping.md | Feature list (from xlsx/pptx) mapped to target TOC chapters |
<topic>-inventory.md | Domain-specific inventory (e.g., interface list, data model list, API catalog) |
Reference documents are fact anchors, not content sources. Extract headings, table titles, and key data. Never copy full text into research notes.
Every TOC entry must have a traceable source (source code path, reference document section, or feature list row). If a TOC entry has no source, delete it from the TOC rather than drafting content without evidence.
# Extract headings from .docx
from docx import Document
doc = Document(path)
for p in doc.paragraphs:
if p.style.name.startswith("Heading"):
print(p.style.name, p.text)
# Extract structured data from .xlsx
import openpyxl
wb = openpyxl.load_workbook(path)
for sh in wb.sheetnames:
for row in wb[sh].iter_rows(values_only=True):
print(row)
# Bulk-extract embedded images from .docx
# unzip -j <path>.docx 'word/media/*' -d ./extracted_imgs/
Goal: Write each chapter as an independent Markdown file.
<output>/<doc>_md/
ch01_<topic>.md
ch02_<topic>.md
ch03_<topic>_p1.md # Split large chapters into parts
ch03_<topic>_p2.md
...
chNN_<topic>.md
appendix_a.md
full_draft.md # Final concatenation
Companion skill: If
claude-mermaid:mermaid-diagramsis available, invoke it before writing Mermaid blocks. It provides syntax best practices, diagram type selection, and live preview tools that produce significantly higher-quality diagrams.
```mermaid code blocks in MarkdownsequenceDiagram does NOT support style directives; avoid themcat ch01_*.md ch02_*.md ... chNN_*.md appendix_*.md > full_draft.md
After merging, review once for: TOC continuity, chapter numbering consistency, and Mermaid block count.
full_draft.md created with correct chapter orderGoal: Render Mermaid diagrams to PNG, then inject Markdown into the template to produce the final .docx.
python scripts/render_mermaid.py <full_draft.md> <images_dir>
This extracts all ```mermaid blocks and renders each to diagram_1.png, diagram_2.png, etc.
python scripts/inject_docx.py \
--md-dir <markdown_dir> \
--template <template.docx> \
--output <output.docx> \
--chapters ch01.md ch02.md ... appendix_a.md
The script: copies the template, clears body content after the TOC, injects Markdown as styled paragraphs, embeds Mermaid PNGs, and forces TOC field refresh.
This is critical. Before running injection, check the template's paragraph styles for issues that will corrupt the output:
from docx import Document
doc = Document(template_path)
normal = doc.styles['Normal']
pf = normal.paragraph_format
print(f"Normal: line_spacing_rule={pf.line_spacing_rule}, line_spacing={pf.line_spacing}")
for style in doc.styles:
if style.name and style.name.startswith("Heading"):
pPr = style.element.find('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pPr')
if pPr is not None:
numPr = pPr.find('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}numPr')
if numPr is not None:
print(f"WARNING: {style.name} has numPr (auto-numbering)")
Check for these three issues and apply fixes:
| Issue | Symptom | Fix |
|---|---|---|
Normal style has line_spacing_rule = EXACTLY | Images are clipped to line height and overlap with text | Override image paragraphs with line_spacing_rule = SINGLE |
Heading styles have numPr elements | Double numbering: "1.1.1 2.1.1 Title" | Strip numPr from all Heading styles before injection |
| Non-Mermaid code blocks ignored | JSON/SQL/pseudocode blocks are blank in .docx | Render code blocks as shaded monospace paragraphs |
See the Template Style Pitfalls section for details.
.docx opens correctly in Word/LibreOfficeThese issues were discovered across 4 production document generations. They are universal to any .docx template injection workflow.
Root cause: Many corporate templates set the Normal paragraph style to line_spacing_rule = EXACTLY with a fixed height (e.g., 26pt). When an image is inserted into a paragraph inheriting this style, the paragraph height is locked to 26pt regardless of image size. The image overflows and overlaps subsequent text.
Fix: Explicitly set line_spacing_rule = SINGLE on every image paragraph:
from docx.enum.text import WD_LINE_SPACING
from docx.shared import Pt, Cm
def add_image(doc, img_path, max_w_cm=14.0, max_h_cm=12.0):
p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
pf = p.paragraph_format
pf.space_before = Pt(6)
pf.space_after = Pt(6)
pf.line_spacing_rule = WD_LINE_SPACING.SINGLE # Override EXACTLY
run = p.add_run()
w_cm, h_cm = image_size_cm(img_path, max_w_cm, max_h_cm)
run.add_picture(img_path, width=Cm(w_cm), height=Cm(h_cm))
Also cap max_h_cm at 12 (not 18) to prevent a single image from filling the entire page.
Root cause: Some templates configure Heading styles with numPr (automatic numbering at the style level). When the Markdown heading text already contains manual numbering (e.g., "2.1.1 System Architecture"), the output shows "1.1.1 2.1.1 System Architecture" - the style's auto-number prepended to the manual number.
Fix: Strip numPr from all Heading styles before injecting content:
WNS = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
def strip_heading_auto_numbering(doc):
for style in doc.styles:
if style.name and style.name.startswith("Heading"):
pPr = style.element.find(f'{WNS}pPr')
if pPr is not None:
numPr = pPr.find(f'{WNS}numPr')
if numPr is not None:
pPr.remove(numPr)
Root cause: Injection scripts that only handle Mermaid fenced blocks often skip other code blocks (JSON, SQL, pseudocode, curl examples), leaving blank spaces in the output.
Fix: Collect non-Mermaid code block lines and render them as shaded monospace paragraphs:
from docx.shared import Pt, RGBColor
from docx.oxml.ns import qn
def add_code_block(doc, lines):
code_text = "\n".join(lines)
p = doc.add_paragraph()
pPr = p._element.get_or_add_pPr()
shd = docx.oxml.OxmlElement('w:shd')
shd.set(qn('w:val'), 'clear')
shd.set(qn('w:color'), 'auto')
shd.set(qn('w:fill'), 'F2F2F2')
pPr.append(shd)
run = p.add_run(code_text)
run.font.name = "Consolas"
run.font.size = Pt(9)
run.font.color.rgb = RGBColor(0x33, 0x33, 0x33)
Root cause: Template headers, footers, and cover pages contain placeholder text ("XXX Project", "XXXX System"). If not replaced, the output ships with the wrong project name.
Fix: Scan and replace header/footer text during the injection step:
for section in doc.sections:
for header_para in section.header.paragraphs:
for run in header_para.runs:
if "XXX" in run.text:
run.text = run.text.replace("XXX", actual_project_name)
LLM context windows have hard limits. These strategies prevent token overflow during document generation:
| Strategy | Stage |
|---|---|
| Per-chapter independent Markdown files | Authoring |
| Research notes are summaries, not full-text copies | Research |
Read source code on demand (ls + Read), never dump entire directories | Research |
| Compress long lists into tables | All stages |
| Per-chapter review checkpoints | Authoring |
| Never load all chapters into a single request | Authoring / Injection |
| Symptom | Root Cause | Fix |
|---|---|---|
| Path typos or directories do not exist | No pre-flight path validation | ls every path in the whitelist before starting |
| Wrong draft used as starting point | Did not verify which document a prior draft belongs to | Read the first chapter to confirm the topic |
| Template placeholder text appears in output | Treated "keep titles consistent" as "keep content identical" | Keep top-level titles; rewrite sub-sections for the actual project |
| Chapter organization does not match the product | Organized by code modules instead of user-facing capabilities | Organize by product capability, not by engineering repo structure |
| Token limit errors | Too many chapters loaded at once | Per-chapter files + summarized research |
| pandoc destroys template fonts/headers/footers | Used pandoc instead of python-docx | Always use python-docx template copy + injection |
| Reference doc text copied verbatim into chapters | Treated source material as content rather than fact anchors | Research phase produces summaries only |
| TOC entries have no source evidence | Concept-level headings imported without code/doc backing | Research phase: annotate every entry with a source; delete unsupported entries |
| "1.1.1 2.1.1 Title" double numbering | Template Heading styles have numPr auto-numbering | Strip numPr before injection |
| JSON/SQL/pseudocode blocks are blank in .docx | Injection script skips non-Mermaid code blocks | Render code blocks as shaded monospace paragraphs |
| Images overlap with text or are clipped | Template Normal style uses EXACTLY line spacing | Set image paragraph line_spacing_rule = SINGLE; cap max_h_cm at 12 |
This skill includes ready-to-use Python scripts in the scripts/ directory:
scripts/render_mermaid.pyExtracts all ```mermaid blocks from a Markdown file and renders each to diagram_N.png using mmdc (Mermaid CLI).
python scripts/render_mermaid.py <markdown_file> <output_image_dir>
Requirements: npx (Node.js), which auto-installs @mermaid-js/mermaid-cli.
scripts/inject_docx.pyCopies a .docx template, clears the body after the TOC, and injects Markdown content as properly styled Word elements.
python scripts/inject_docx.py \
--md-dir ./output/chapters_md \
--template ./templates/design_spec.docx \
--output ./output/design_spec.docx \
--chapters ch01.md ch02.md ch03.md appendix_a.md \
--header-replace "XXX=My Project Name"
Features:
numPr auto-numbering removalSINGLE line spacing (prevents clipping)Requirements: python-docx, Pillow.
scripts/puppeteer-config.jsonDisables Chromium sandboxing for mmdc in Linux/container environments:
{ "args": ["--no-sandbox"] }
This skill is fully self-contained — it works without any companion skills installed. However, if the following skills are available in your environment, they significantly improve specific stages:
| Skill | Stage | Benefit |
|---|---|---|
claude-mermaid:mermaid-diagrams | Stage 5 (Authoring) | Provides Mermaid syntax best practices, diagram type selection guidance, and live preview/save tools (mermaid_preview / mermaid_save). Produces higher-quality diagrams than writing Mermaid from scratch. |
superpowers:brainstorming | Stage 1 (Template Analysis) | Structured brainstorming workflow that explores user intent, requirements, and design alternatives before committing to a TOC mapping. Reduces rework. |
superpowers:writing-plans | Stage 3 (Plan) | Structured planning workflow for multi-step implementation tasks. Helps break complex documents into well-scoped per-chapter tasks. |
How to use them: If a companion skill is available, invoke it via the Skill tool at the relevant stage. If it is not available, follow the inline guidance in this skill — the core instructions for each stage already cover the essential techniques.
Example: During Stage 5, if claude-mermaid:mermaid-diagrams is installed, invoke it before writing Mermaid blocks. If not, follow the Mermaid guidelines in the Authoring section directly.
Use this checklist when starting any new document:
line_spacing_rule and Heading numPrfull_draft.md and review.docx and verify: cover page, TOC refresh, image layout, code blocks