Install
openclaw skills install @lywhlao2025/epub-bilingual-convert-skillProcess EPUB files into structured extraction JSON, guide AI paragraph-aligned translation and summary filling, then assemble bilingual EPUBs with target-language summaries and report.txt. Use when an agent needs to convert books, magazines, newsletters, Calibre-generated EPUBs, The Economist-style EPUBs, or other EPUB sources into bilingual editions while preserving spine order, images, CSS, TOC/index pages, and one-to-one paragraph alignment.
openclaw skills install @lywhlao2025/epub-bilingual-convert-skillUse this skill to convert an EPUB into a bilingual EPUB through four separated stages:
scripts/extract.py.scripts/estimate_tokens.py.extraction.json.report.txt with scripts/assemble.py.Keep extraction, translation, and assembly responsibilities separate. Extraction must not translate. Assembly must not invent translations. Translation must not alter source fields or structure.
Require an EPUB file and target language. If the user omits the target language, ask for it before processing.
Use an output directory supplied by the user. If none is supplied, create a sibling directory named output beside the EPUB.
If the user provides an EPUB path for translation or conversion, create or refresh extraction.json from that EPUB before estimating. If the user explicitly provides an extraction.json path, estimate and continue from that file instead of re-extracting.
Before running the scripts, ensure Python 3 is available and install the required parser libraries if they are missing:
python3 -m pip install beautifulsoup4 lxml
The scripts use only Python standard libraries plus beautifulsoup4 and lxml.
python3 scripts/extract.py /path/to/input.epub /path/to/output --target-language Chinese
Estimate the translation budget before translating:
python3 scripts/estimate_tokens.py /path/to/output/extraction.json
Show the estimate to the user and wait for explicit confirmation. Only after the user confirms, fill target_language fields in /path/to/output/extraction.json, then run:
python3 scripts/assemble.py /path/to/output/extraction.json
The final outputs are:
/path/to/output/bilingual_<input filename>.epub/path/to/output/summary/*.txt/path/to/output/report.txtRun scripts/extract.py. It reads EPUB files as zip archives, finds the OPF path via META-INF/container.xml, follows OPF spine order, classifies HTML/XHTML pages, extracts article-like pages, copies first images to summary/, and writes UTF-8 extraction.json.
The extractor supports:
ul.calibre_feed_list and data-testid="article-title".h1.chapter_title, h1.chapter_head, or content-heavy spine pages.For unfamiliar EPUB sources, inspect the archive before changing logic:
python3 -c "import zipfile; z=zipfile.ZipFile('/path/to/input.epub'); print('\n'.join(z.namelist()))"
If extraction quality is poor, adjust page classification, title extraction, paragraph filtering, TOC detection, or image path handling in scripts/extract.py after observing the source structure.
For every EPUB translation or conversion request, run scripts/estimate_tokens.py on extraction.json to estimate the translation-stage token budget before translating:
python3 scripts/estimate_tokens.py /path/to/output/extraction.json
When the user provides an EPUB, create or refresh extraction.json from the EPUB first. When the user explicitly provides extraction.json, use that file directly.
After showing the estimate report, stop and ask the user to confirm before starting translation. Do not begin filling translations until the user explicitly confirms after seeing the estimate. Valid confirmations include short replies such as 继续, ok, 可以, 确认, 开始翻译, proceed, or go ahead.
The user's initial request to translate or convert an EPUB is not enough to bypass this gate. The confirmation must happen after the concrete estimate report is shown.
If the user only asks for token estimation and does not ask to translate or convert, run extraction if needed, show the estimate, and stop without asking for translation confirmation.
The estimator is a lightweight character heuristic, not a tokenizer-exact counter. It estimates only translation-stage model usage:
It does not estimate local EPUB extraction, parsing, or assembly work because those are local scripts. It also does not output cost amounts.
Default estimation behavior:
extraction.json, not the raw EPUB.--max-source-chars 8000.--retry-buffer 0.15.Useful options:
python3 scripts/estimate_tokens.py /path/to/output/extraction.json --max-source-chars 6000 --retry-buffer 0.2 --top 10
Translation execution policy:
For large EPUBs, prefer a resumable batch workflow instead of trying to translate the whole book in one pass:
extraction.json.See docs/superpowers/specs/2026-05-31-resumable-translation-batches-design.md for the full testable design.
Read extraction.json and fill only these fields for each article:
title_dest_languagesection_dest_languagetranslated_paragraphssummary_dest_languageDo not modify:
numtitlesectionhrefparagraphsplain_textimage_filenameTranslate in three focused passes instead of trying to satisfy every quality goal at once:
Faithfulness (信) -> Draft A
Elegance (雅) checks and refines Draft A -> Draft B
Readability (达) checks and refines Draft B -> Final translation
Keep paragraph count, paragraph order, facts, and meaning unchanged across all passes. If a later pass conflicts with faithfulness, revert to the faithful wording.
Translate each source paragraph into Draft A while focusing only on faithfulness:
After this pass, Draft A should be structurally complete and faithful, even if the wording is not yet polished.
Review Draft A and generate Draft B by improving elegance while preserving faithfulness:
After this pass, Draft B should sound more refined but must not change source meaning, add commentary, or lose details from Draft A.
Review Draft B and generate the final translation by improving target-language readability:
Only write the final pass into translated_paragraphs. Do not expose Draft A or Draft B in extraction.json.
Hard constraint:
len(translated_paragraphs) == len(paragraphs)
Each translated_paragraphs[i] must translate exactly paragraphs[i]. Do not merge, split, reorder, omit, add commentary, add notes, or put summaries/titles inside paragraph translations.
Bilingual reading order must always be source language first, target language second. For every paragraph pair, render or preview paragraphs[i] before translated_paragraphs[i]. For titles and TOC labels, use {source title} | {target title}. Do not place target-language content before the corresponding source-language content in bilingual previews or EPUB body content.
Generate summary_dest_language from plain_text in the target language. Default to 150-300 target-language characters when the article is long; shorter is acceptable for short articles.
After each article, self-check:
When only doing the translation stage, output valid JSON only, without Markdown fences or explanation.
Run scripts/assemble.py after all translation fields are complete:
python3 scripts/assemble.py /path/to/output/extraction.json
The assembler:
{source title} | {target title}.<p class="dest_translation">...</p> after each source paragraph..dest_translation CSS once into HTML/XHTML heads.report.txt with EPUB title, paths, target language, article/image counts, missing first images, and untranslated/incomplete articles.If assemble.py reports incomplete translations, fix extraction.json instead of changing source fields or bypassing validation.
extraction.json uses this shape:
{
"epub_title": "...",
"input_epub": "/abs/path/to/input.epub",
"output_dir": "/abs/path/to/output",
"target_language": "Chinese",
"total_articles": 1,
"articles": [
{
"num": 1,
"title": "Source Title",
"section": "Source Section",
"href": "relative/path/article.xhtml",
"paragraphs": ["Source paragraph"],
"plain_text": "Source plain text, max 8000 chars",
"image_filename": null,
"title_dest_language": null,
"section_dest_language": null,
"translated_paragraphs": null,
"summary_dest_language": null
}
]
}
Minimum validation before delivery:
python3 scripts/extract.py /path/to/input.epub /path/to/output --target-language Chinese
python3 scripts/estimate_tokens.py /path/to/output/extraction.json
python3 -m json.tool /path/to/output/extraction.json >/dev/null
python3 scripts/assemble.py /path/to/output/extraction.json
For assembly validation without real translation, use temporary placeholder values only in a disposable copy of extraction.json. Do not present placeholder output as a completed translation.