{"skill":{"slug":"tex2docx","displayName":"论文常用的tex格式转换为word格式，核心是能转换数学公式（非图片），表格，图表等等","summary":"Convert LaTeX (.tex) academic papers to Word (.docx) with editable OMML equations, native Word tables, embedded figures, IEEE two-column format, and bibliogr...","description":"---\nname: tex2docx\ndescription: Convert LaTeX (.tex) academic papers to Word (.docx) with editable OMML equations, native Word tables, embedded figures, IEEE two-column format, and bibliography. Use when a user provides a .tex file and asks for a Word/DOCX version, or when converting academic LaTeX papers to editable Office format.\n---\n\n# tex2docx — LaTeX to Word Converter\n\n## Requirements\n\n- **pandoc** (system install): `winget install pandoc` or pandoc.org\n- **Python packages**: `pip install python-docx lxml pypandoc_binary`\n\n## Usage\n\n```bash\npython scripts/tex2docx.py input.tex [output.docx]\n```\n\nIf `output.docx` is omitted, output is `input.docx` in the same directory.\n\n## How It Works (Three Phases)\n\n```\n.tex ──→ [pandoc] ──→ OMML equations (13+ Word-editable formulas)\n  │\n  └──→ [Custom parser] ──→ Native Word tables ├──→ Final .docx\n                           Embedded figures     │   (merged)\n                           Formatted refs       │\n                           IEEE layout & font  ┘\n```\n\n### Phase 1 — Pandoc\nRuns pandoc via pypandoc. Input file must be in its own directory (with `figures/` subfolder if images exist). The script `chdir`s to the tex directory before running pandoc so image paths resolve correctly.\n\n### Phase 2 — Custom LaTeX Parser\nRegEx-based extraction of:\n- **Tables**: `\\begin{table}` → Word Table objects (full borders, centered, 8pt TNR)\n- **Figures**: `\\includegraphics{}` + `\\caption{}` → PNG/PDF embeds with italic captions\n- **References**: `\\thebibliography` → formatted entries with hanging indent\n- **Sections**: `\\section{}`, `\\subsection{}` → bold headings\n- **Metadata**: `\\title`, `author`, `\\abstract`, `\\IEEEkeywords`\n\n### Phase 3 — Merge\nOMML equation paragraphs from pandoc are inserted into the cleanly-built document. Body paragraphs get 0.25in first-line indent. All LaTeX commands (`\\textbf`, `\\toprule`, `\\ref`, `\\cite`, `\\begin{itemize}`, etc.) are stripped from text content.\n\n## Output Format\n\n| Feature | Detail |\n|---------|--------|\n| Font | Times New Roman (10pt body, 9pt table/figure, 8pt refs) |\n| Layout | A4, two-column IEEE conference style |\n| Equations | OMML (double-click to edit in Word) |\n| Tables | Native Word tables, all borders |\n| Figures | PNG/PDF embedded with \"Fig.\" captions |\n| References | Hanging indent, `[bN]` format |\n| First indent | 0.25in on body paragraphs |\n\n## Verification\n\n```bash\npython scripts/verify.py output.docx\n```\n\nReports paragraph/table/image/equation counts and checks for LaTeX residue.\n\n## Chinese (ctex) Support\n\nFully supports Chinese LaTeX documents using the `ctex` package:\n- Chinese section titles (引言, 方法, 实验, 结论等) are recognized\n- `\\section*{}` (star variant) is supported\n- Chinese table headers preserved\n- Chinese text in titles rendered via `w:eastAsia` font fallback\n- `\\title{...}` and `\\author{...}` residue paragraphs are filtered\n\n## Limitations\n\n- **Inline math** (`$...$`) becomes plain text (italic), not OMML — only `\\begin{equation}`, `\\begin{align}`, and `\\[...\\]` become editable equations\n- **No .bib support**: references must be in `\\thebibliography{}` environment\n- **PNG images preferred**: script tries PNG then PDF fallback\n- **Pandoc path**: the system pandoc binary must be discoverable by pypandoc\n\n## Script: `scripts/tex2docx.py`\n\nSelf-contained (660+ lines). Key internal functions:\n\n| Function | Role |\n|----------|------|\n| `extract_tex()` | Parse all structural elements from .tex |\n| `extract_omml()` | Pull OMML XML from pandoc output |\n| `build_docx()` | Construct final document with all components |\n| `clean()` | Strip LaTeX commands to plain text |\n| `add_table()` | Build Word table with borders |\n| `add_figure()` | Embed image + caption |\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":382,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1778049689317,"updatedAt":1778492854429},"latestVersion":{"version":"1.0.0","createdAt":1778049689317,"changelog":"Initial release of tex2docx — LaTeX to Word converter.\n\n- Converts academic LaTeX (.tex) files to Word (.docx) with IEEE-style formatting\n- Preserves editable OMML equations, native Word tables, embedded figures, and bibliography\n- Supports two-column layout, Times New Roman font, and paragraph/table formatting\n- Includes Chinese (ctex) document support for section titles, headers, and text\n- Requires Python, pandoc, and a few Python packages (python-docx, lxml, pypandoc_binary)\n- Provides a verification script to check output quality and LaTeX residue","license":"MIT-0"},"metadata":null,"owner":{"handle":"wsyummy","userId":"s174g42p44rt00dg2x4jscq7k1853h83","displayName":"huangchuzhi","image":"https://avatars.githubusercontent.com/u/90503267?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780090751842}}