Resilient PDF

v1.1.0

Recover PDF extraction and summarization workflows when native PDF handling fails, hangs, times out, or rejects large files. Use when working with local or r...

0· 121·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for cdmichaelb/resilient-pdf.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Resilient PDF" (cdmichaelb/resilient-pdf) from ClawHub.
Skill page: https://clawhub.ai/cdmichaelb/resilient-pdf
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install resilient-pdf

ClawHub CLI

Package manager switcher

npx clawhub@latest install resilient-pdf
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description match the provided SKILL.md and the included script. The script implements URL download, local extraction via a detected 'uvx' binary, chunking, and a lightweight summary — all coherent with a 'resilient PDF' fallback workflow. There are no unrelated credentials, binaries, or config paths requested.
Instruction Scope
Runtime instructions are narrowly scoped: download a remote PDF or read a local path, run scripts/extract_pdf.py to produce markdown/chunks/summary, and inspect outputs. The instructions and script only access the provided PDF, workspace paths, and the user's home path to look for ~/.local/bin/uvx. They do not attempt to read unrelated config, secrets, or other system data.
Install Mechanism
There is no install spec; the skill is instruction-only with one helper script. The only install hint is a pip command to obtain 'uv' (uvx) if missing. No downloads from untrusted URLs or archive extraction are present in the install step.
Credentials
No environment variables, credentials, or config paths are required. The script does check for uvx in PATH and ~/.local/bin, and writes outputs to workspace locations requested by the operator — these are proportional to the stated task.
Persistence & Privilege
The skill does not request permanent presence (always:false) and does not modify other skills or system-wide settings. It writes files only to operator-specified output or chunk directories and may create parent directories as needed, which is appropriate for a local extraction workflow.
Assessment
This skill appears to do exactly what it claims: download PDFs you point it at, run a local extractor (uvx/markitdown), produce markdown/chunks and a small first-pass summary. Before using it: (1) only pass URLs you trust (it will download arbitrary remote content), (2) review or sandbox the uvx binary you will invoke — uvx will be executed as a subprocess and its provenance matters, (3) prefer running installs (pip ...) as a non-root user and inspect packages before installation, and (4) remember PDFs themselves can contain malicious content, so run this on untrusted PDFs in an isolated environment if you have concerns.

Like a lobster shell, security has layers — review code before you run it.

latestvk97brrdk9whawcmvjgg52qv60d84j2m2
121downloads
0stars
2versions
Updated 2w ago
v1.1.0
MIT-0

Resilient PDF

Use this skill as a fallback workflow for PDFs that break normal analysis paths.

Overview

Prefer the built-in pdf tool first when it is likely to work. If it fails, hangs, times out, or the file is too large, switch to this local workflow.

Read references/patterns.md if you need the rationale, chunking heuristics, or fallback guidance.

Workflow

  1. Confirm the PDF source.

    • If remote, download it into the workspace first.
    • If local, confirm the path and file size.
  2. Decide whether the normal path is already broken.

    • Trigger this skill when the built-in pdf tool aborts, provider-native upload fails, or file limits make direct analysis unlikely to work.
  3. Run the helper extractor.

    • Use scripts/extract_pdf.py to extract markdown locally.
    • Use --url to download a remote PDF first.
    • Add --chunk-dir when the output will be too large to read in one pass.
    • Add --summary-out to generate a lightweight first-pass summary artifact.
  4. Inspect the extracted output.

    • Read the head, table of contents, or key sections first.
    • Do not trust a summary until the extracted text looks sane.
  5. Summarize or analyze.

    • For short outputs, read the extracted markdown directly.
    • For long outputs, read selected chunks or key sections.
    • Use the generated first-pass summary as a navigation aid, not as final truth.
    • Keep quoted claims and numeric claims grounded in the extracted text.

Helper script

Local file command:

python3 skills/resilient-pdf/scripts/extract_pdf.py <file.pdf> --out <output.md> --json

Remote URL command:

python3 skills/resilient-pdf/scripts/extract_pdf.py \
  --url <https://example.com/file.pdf> \
  --out <output.md> \
  --download-to <downloaded.pdf> \
  --json

Chunked plus summary command:

python3 skills/resilient-pdf/scripts/extract_pdf.py <file.pdf> \
  --out <output.md> \
  --chunk-dir <chunk-dir> \
  --summary-out <summary.md> \
  --chunk-chars 120000 \
  --chunk-overlap 4000 \
  --json

The script:

  • accepts either a local file path or --url
  • downloads remote PDFs when needed
  • looks for uvx
  • invokes uvx --from 'markitdown[pdf]' markitdown
  • writes extracted markdown
  • optionally writes chunk files
  • optionally writes a lightweight first-pass summary markdown file
  • emits a machine-readable JSON result

If dependencies are missing

If uvx is not available, tell the operator the exact command to install it:

python3 -m pip install --user --break-system-packages uv

Do not silently install dependencies unless the user asked you to.

Output expectations

A successful run should give you:

  • downloaded PDF path when using --url
  • extracted markdown path
  • byte count
  • text character count
  • optional chunk paths
  • optional first-pass summary path

Use those outputs as the source of truth for later summarization.

Notes

  • This skill does not replace the built-in pdf tool. It is the fallback when that path is unreliable.
  • Prefer workspace-local outputs so later reads and summaries are reproducible.
  • If the extracted markdown is noisy, inspect section headers and sample passages before making strong claims.

Comments

Loading comments...