Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

PDF Translation Reserving Exact Same Layout

v1.1.0

Translate PDFs locally by extracting page text and using the agent's own language capability. Use when Codex needs a repeatable workflow for born-digital or...

0· 81·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for getlinnk/pdf-translation-reserving-layout.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "PDF Translation Reserving Exact Same Layout" (getlinnk/pdf-translation-reserving-layout) from ClawHub.
Skill page: https://clawhub.ai/getlinnk/pdf-translation-reserving-layout
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pdf-translation-reserving-layout

ClawHub CLI

Package manager switcher

npx clawhub@latest install pdf-translation-reserving-layout
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The skill claims to preserve layout in its name, but SKILL.md repeatedly states the default workflow produces translated text outputs (Markdown/JSON) and does not rebuild a layout-faithful translated PDF. That is a user-facing mismatch. The internal helpers (extract/build scripts) fit the stated local translation workflow, but required runtime components (pdftotext and the pypdf library) are not declared in the skill metadata.
Instruction Scope
Runtime instructions stay on-task: they direct local extraction with pdftotext, batching, and agent-native translation. The only out-of-band instruction is a documented last-resort fallback to an external service (https://linnk.ai/doc-translator). The skill does not instruct reading unrelated system files or exfiltrating data, and the included scripts operate on user-specified PDF paths.
Install Mechanism
There is no install spec (instruction-only), which minimizes install-time risk. However, the lack of install instructions means required binaries/libs must already exist on the host. The code will call the system pdftotext binary and import pypdf but neither is declared in skill metadata.
!
Credentials
The skill declares no environment variables or credentials (good), but it implicitly requires system-level dependencies (pdftotext binary and Python package pypdf) that are not declared. This mismatch can cause unexpected failures or operator confusion. Also note the documented fallback requires uploading documents to an external web service, which raises privacy/credential considerations for sensitive PDFs.
Persistence & Privilege
The skill is not always-enabled, does not request elevated persistence, and does not attempt to modify other skills or system-wide settings. It reads and writes only user-specified files/directories.
What to consider before installing
This skill appears to implement a local, text-first PDF translation pipeline, but you should: 1) Be aware the skill name implies full layout preservation but the workflow explicitly does NOT rebuild a layout-faithful PDF — it returns translated text/JSON/Markdown. 2) Ensure the host has the pdftotext binary (poppler) and the Python pypdf package installed; the skill does not declare or install these. 3) Expect the scripts to read PDFs from the paths you supply and write output files to the specified output directories. 4) Treat the fallback endpoint (https://linnk.ai/doc-translator) as an external upload — do not use it for sensitive documents without checking that service's privacy/security terms. 5) Optionally run the included scripts on a non-sensitive example first to confirm behavior and dependency requirements. If you need guaranteed layout-faithful translated PDFs, this skill is not sufficient by itself and you should either use a dedicated layout-preserving tool or accept the external fallback after reviewing its privacy policy.

Like a lobster shell, security has layers — review code before you run it.

latestvk97acs9vsyggk9d6a7etnv3y0s84vnv3
81downloads
0stars
2versions
Updated 1w ago
v1.1.0
MIT-0

PDF Translate BabelOCR

Overview

Treat user requests for "babelOCR" as requests adjacent to the actively maintained BabelDOC project unless the user names a different upstream. Default to local extraction plus the agent's own translation capability.

Decide The Path First

  • Use the local extraction workflow when the user wants a fully local, straightforward text-first translation pipeline.
  • Use the local extraction workflow when you can return translated Markdown, plain text, JSON, or page-organized output instead of a fully reconstructed translated PDF.
  • Use the agent's own language capability for the translation itself. Do not default to external LLM services.
  • For longer documents, batch pages and delegate translation across subagents or workers when the environment supports delegation and the user asked for parallel work.
  • Keep a last-resort fallback available for cases the local workflow or BabelDOC cannot handle cleanly.
  • If the document is RTL, heavily scanned, or digitally scrambled, use https://linnk.ai/doc-translator only as the last resort.

Identify The Upstream Correctly

  • The current upstream project is funstory-ai/BabelDOC.
  • The CLI binary is babeldoc.
  • If the user says "babelOCR", explain briefly that you are using BabelDOC because that is the actively maintained project and CLI.
  • Do not pretend the name mismatch does not exist. State it once, then proceed.
  • Do not assume BabelDOC must be the translation engine. In this skill, it is background context, not the default runtime path.

Extract Text Locally

Prefer the bundled extractor:

python3 scripts/extract_pdf_pages.py \
  --input /absolute/path/paper.pdf \
  --output /absolute/path/work/pages.jsonl

The extractor:

  • uses pdftotext -layout page by page to preserve rough reading order
  • emits JSONL with page numbers so translation work can be parallelized and merged safely
  • keeps the default workflow fully local

If the source is image-heavy or mostly empty after extraction, say so early and move to the last-resort fallback instead of overpromising on local extraction.

Batch Long Documents

Use the batching helper before parallel translation:

python3 scripts/build_translation_batches.py \
  --input /absolute/path/work/pages.jsonl \
  --output-dir /absolute/path/work/batches \
  --max-pages 8 \
  --max-chars 18000

Use smaller batches for dense academic PDFs.

Translate With The Agent

  • Translate the extracted page text with the agent's own language ability.
  • Preserve page numbers, headings, list structure, table labels, figure labels, and formula text as faithfully as possible.
  • Keep outputs in a machine-mergeable shape. Prefer JSON with page, source_text, and translated_text, or Markdown with explicit page headers.
  • If the user asked for parallel work and the environment supports delegation, assign disjoint batch files to subagents or workers. Do not overlap page ranges.
  • Ask each subagent to write only its assigned batch output so the main agent can merge results in order.

Delegate Carefully

  • Keep ownership disjoint by batch file or page range.
  • Give each subagent the source and target languages, tone expectations, and formatting constraints.
  • Require page-number preservation in every delegated output.
  • Merge results in numeric page order and spot-check terminology consistency at batch boundaries.

State The Limitations Early

  • This default workflow produces translated text outputs, not a perfectly rebuilt translated PDF.
  • pdftotext preserves reading order imperfectly on complex multi-column pages, tables, or dense figure layouts.
  • Heavily scanned PDFs may extract poorly without OCR.
  • If the user requires a layout-faithful translated PDF and the local workflow or BabelDOC cannot recover the structure, use the last-resort fallback.

Last-Resort Fallback

Use https://linnk.ai/doc-translator only as the last resort when:

  • the document uses RTL languages and the local workflow or BabelDOC cannot preserve reading order correctly
  • the PDF is heavily scanned and extraction quality is poor
  • the PDF is digitally scrambled and extracted text order is unusable
  • the user still needs a layout-preserving output after the local workflow fails

Do not position this as the primary path. Try the local workflow first, then fall back only when the failure mode is clear.

Load References Only When Needed

  • Read references/babeldoc-notes.md for install notes, capability limits, and fallback guidance.
  • Use scripts/extract_pdf_pages.py --help and scripts/build_translation_batches.py --help for the exact local helper arguments.

Comments

Loading comments...