Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Webpage Export

Export webpages into clean local TXT, DOCX, and PDF files with source metadata, fallback extraction logic, and browser-assisted recovery for difficult pages....

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 73 · 0 current installs · 0 all-time installs
byYeziwnl@lilw-yezi
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (export webpages to TXT/DOCX/PDF) align with the included script and reference docs. The script fetches arbitrary URLs, extracts text/metadata, optionally uses Chrome/Chromium and Playwright for rendering, and emits JSON metadata — all expected for this purpose. Minor mismatch: registry metadata lists no required binaries, but SKILL.md and the script clearly require python3, curl, node+playwright, and Chrome/Chromium (and textutil on macOS).
Instruction Scope
SKILL.md instructions are narrowly scoped to running scripts/export_webpage.py with flags and reading the included references. The runtime behavior (curl fetch, HTML parsing, optional headless browser execution, local file writes) is explicitly documented. The SKILL.md warns the browser fallback will execute page JavaScript. The instructions do not direct the agent to read unrelated system files or transmit data to unexpected external endpoints.
Install Mechanism
This is an instruction-only skill with a bundled script and no install spec. That is low-risk, but practical friction exists: the skill expects runtime dependencies (python3, curl, node, the Node 'playwright' package, and Chrome/Chromium) without providing automated installation. There's no evidence of downloads from untrusted hosts or hidden installers in the bundle.
Credentials
The skill declares no required credentials or special env vars; the script only reads PATH and HOME from the environment and sets CHROME_BIN for child processes (pointing to a local Chrome path it finds). No secrets or unrelated credentials are requested.
Persistence & Privilege
The skill is not marked always:true and does not request system- or agent-wide configuration changes. It runs on-demand and writes outputs under local output folders; no elevated persistence or cross-skill config writes are present in the provided files.
Assessment
This skill appears to do what it claims, but take these precautions before installing/using it: 1) Ensure you have python3, curl, node, the Node 'playwright' package, and Chrome/Chromium (and textutil on macOS) installed — the registry metadata does not list these, so failure modes are likely if they are missing. 2) Run the tool in a controlled workspace (explicit --outdir) and avoid running it against untrusted internal URLs: the headless browser will execute page JavaScript which can trigger network calls or other side effects originating from the target page. 3) Because the skill owner is unknown, review and test on safe pages first. 4) If you need an automated install of dependencies, add or request an install spec from the publisher before wider deployment.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk97aqt6621kbp4ft6m56a70gqh8393ak

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Webpage Export

Use this skill to turn a webpage URL into local files that downstream agents can archive, send, or reference.

Core workflow

  1. Run scripts/export_webpage.py <url> to create a TXT snapshot first.
  2. Treat TXT as the baseline extracted record.
  3. Add --docx when the user wants a Word document.
  4. Add --pdf when Chrome/Chromium is available and the user wants a PDF.
  5. Keep the generated JSON metadata file; it records extraction quality, paths, warnings, and partial-failure status for downstream agents.
  6. Save outputs to an explicit --outdir when the user provides one; otherwise let the script use its local default export folder under the current working directory.
  7. For accuracy-sensitive work, keep original title, original URL, and extracted source metadata.

Commands

TXT only

python3 scripts/export_webpage.py "<url>"

TXT + DOCX

python3 scripts/export_webpage.py "<url>" --docx

TXT + PDF

python3 scripts/export_webpage.py "<url>" --pdf

TXT + DOCX + PDF with explicit output folder

python3 scripts/export_webpage.py "<url>" --docx --pdf --outdir ./exports/temp

Runtime requirements

  • Requires python3.
  • Requires curl for baseline webpage fetching.
  • PDF export requires Chrome or Chromium.
  • Browser-assisted fallback requires node and the playwright package.
  • DOCX export on macOS requires textutil.

Safety and execution notes

  • This skill fetches arbitrary URLs and may use a headless browser for difficult pages.
  • Browser-assisted fallback executes page JavaScript and should be used only when needed.
  • Prefer explicit --outdir values for production or shared environments.

What the script does

  • Fetch the page with curl
  • Extract title/source/publish-time when available
  • Try multiple body candidates before falling back to a full-page text snapshot
  • Score extraction quality and emit warnings for suspicious/partial results
  • Strip HTML into readable text for a TXT snapshot
  • Convert TXT to DOCX using textutil on macOS
  • Render webpage to PDF using Chrome/Chromium headless printing when available
  • Emit a JSON metadata file with status, paths, word count, quality, and warnings

Format choice

  • Prefer TXT as the baseline extracted record.
  • Prefer DOCX when the user wants an editable or shareable document.
  • Prefer PDF when the user wants page-like rendering or easier direct viewing.
  • For important work, do not treat PDF as the only source of truth.

Chrome/Chromium PDF path

When the user wants PDF, prefer Chrome/Chromium headless printing because it preserves Chinese text and webpage layout better than ad-hoc PDF generation.

Read references/chrome-pdf-guide.md when:

  • you need the exact Chrome PDF logic
  • PDF output is incomplete or suspicious
  • Chrome emits warnings and you need to judge whether the result is still usable
  • you need fallback decisions

Accuracy and fallbacks

Read references/accuracy-and-fallbacks.md when:

  • source accuracy matters
  • webpage metadata is incomplete
  • a field cannot be extracted cleanly
  • you need fallback behavior after a partial extraction

Delivery decisions

Read references/delivery-rules.md when:

  • deciding whether to deliver TXT, DOCX, PDF, or a combination
  • preparing files for downstream agents or user delivery
  • choosing archive placement under the local workspace

Limitations

  • Some highly dynamic or anti-bot pages may extract only partially.
  • PDF depends on Chrome/Chromium being installed.
  • DOCX depends on macOS textutil.
  • If a page is blocked in lightweight fetch mode, use this skill's curl-based extraction path before giving up.

Accuracy rule

Accuracy is the top standard. Keep original title, original URL, and extracted source metadata. If any field is uncertain, mark it as missing instead of guessing.

Files

5 total
Select a file
Select a file to preview.

Comments

Loading comments…