Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Universal Doc Processor

v2.0.0

通用文档处理Skill - 支持所有格式、无大小限制、批量处理的智能文档分析与修改。 具备文件暂存与按需执行能力,遵循状态管理机制。 触发场景: - 用户上传任意格式文件后,等待用户明确任务指令 - 用户需要文档分析、修改、摘要、提取、翻译等操作 - 用户需要补充关键信息才能执行任务 核心规则: - 收到文件仅解...

0· 70·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for cyril-ruidong/universal-doc-processor.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Universal Doc Processor" (cyril-ruidong/universal-doc-processor) from ClawHub.
Skill page: https://clawhub.ai/cyril-ruidong/universal-doc-processor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install universal-doc-processor

ClawHub CLI

Package manager switcher

npx clawhub@latest install universal-doc-processor
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description match the implementation: the SKILL.md and scripts/processor.py implement multi-format parsing, state management, and ask for user instructions before processing. The parsing routines (PDF, Word, Excel, PPT, text, CSV, JSON, binary fallback) align with the described capabilities.
Instruction Scope
SKILL.md confines behavior to: receive files, store metadata, wait for user task, ask follow-up questions, and then execute. The code follows that flow. However SKILL.md claims automatic cleanup after 72 hours but the provided code is truncated and I did not find an explicit cleanup/garbage-collection implementation in the visible code — so retention semantics are asserted but not proven. Also the code will open arbitrary filesystem paths provided in file_list, which is required for file-processing but expands the attack surface if untrusted paths are passed into the skill.
Install Mechanism
No install spec (instruction-only + a local script) — this reduces supply-chain risk. The code conditionally imports third-party libraries (PyPDF2, python-docx, pandas/openpyxl, python-pptx). Those dependencies are not declared/installed by the skill; runtime will fall back to binary previews if libs are missing. This is coherent but means behavior will vary by environment; it does not pull remote code itself.
Credentials
The skill requests no environment variables or credentials. That is proportional to a document-processing skill. Note: it still reads files from file paths supplied to it (open/read), so the real risk depends on how the hosting environment supplies those paths and whether arbitrary system paths can be injected.
Persistence & Privilege
The skill retains uploaded files in its in-memory file list and explicitly supports multi‑round tasks. SKILL.md promises 72-hour cleanup, but the visible code does not show a robust persistence/cleanup mechanism. Retaining user documents increases privacy risk — consider whether files are stored encrypted, on-disk, or only in memory, and who can access them.
What to consider before installing
This skill appears to do what it says (multi-format parsing, state-managed "wait for user instruction" flow) and does not request secrets. Key things to check before installing or enabling it: 1) Confirm how your platform provides file paths to the skill — ensure untrusted inputs cannot cause it to open arbitrary system files. 2) Ask where uploaded files are stored (memory vs disk), whether they are encrypted, and whether the promised 72-hour deletion is actually implemented. 3) If you need full parsing features, ensure the necessary Python packages (PyPDF2, python-docx, pandas/openpyxl, python-pptx, chardet) are present or accept that fallback behavior will be a binary preview. 4) Test with non-sensitive files first to verify behavior and output. 5) If you require stricter guarantees, request adding explicit sandboxing, strict path validation (only allow skill-provided upload directories), and a clearly implemented cleanup routine. If you cannot get those assurances, treat the skill as high privacy risk and avoid uploading sensitive documents.

Like a lobster shell, security has layers — review code before you run it.

latestvk973e70wea3xxap7xbkte1x42d84ctt6
70downloads
0stars
1versions
Updated 3w ago
v2.0.0
MIT-0

UniversalDocProcessor - 通用文档处理Skill

Skill概述

这是一个通用文档处理框架,支持多种格式文件的智能分析与处理,具备状态管理机制。

核心能力

  1. 多格式解析:PDF、Word、Excel、PPT、TXT、Markdown、CSV、JSON等
  2. 状态管理:WAITING_FOR_FILES → FILES_RECEIVED → EXECUTING_TASK → NEED_INFO
  3. 按需执行:收到文件不自动处理,等用户指令
  4. 信息补全:任务信息不足时先提问
  5. 多轮任务:完成一次任务后保留文件,可继续执行新任务

状态机定义

┌─────────────────────┐
│  WAITING_FOR_FILES  │ ← 初始状态
└─────────┬───────────┘
          │ 收到文件
          ▼
┌─────────────────────┐
│   FILES_RECEIVED    │ ← 文件已存储,等待指令
└─────────┬───────────┘
          │ 用户下达任务
          ▼
┌─────────────────────┐
│   EXECUTING_TASK    │ ← 执行任务中
└─────────┬───────────┘
          │ 完成/信息不足
          ▼
┌─────────────────────┐
│      NEED_INFO      │ ← 等待用户补充信息
└─────────┬───────────┘
          │ 信息补全
          ▼
      (回到EXECUTING_TASK)

处理流程

Step 1: 文件接收(WAITING_FOR_FILES → FILES_RECEIVED)

收到文件时

  • 解析文件内容
  • 存储文件元信息
  • 返回确认消息

返回话术

已收到 {文件数量} 个文件,等待您的具体任务指令

Step 2: 任务等待(FILES_RECEIVED)

保持等待

  • 不主动分析
  • 不主动处理
  • 仅响应用户指令

用户可下达

  • 分析需求("分析这份文档"、"提取关键信息")
  • 修改需求("帮我修改第3段"、"整理格式")
  • 转换需求("翻译成英文"、"转成PDF")
  • 提取需求("提取表格"、"提取图片")

Step 3: 信息检查(EXECUTING_TASK)

信息完整时

  • 执行任务
  • 生成结果
  • 返回文件

信息不足时

  • 切换到NEED_INFO状态
  • 提问具体补充问题

Step 4: 信息补全(NEED_INFO)

用户回复后

  • 解析补充信息
  • 再次检查完整性
  • 完整则继续执行

支持的文件格式

格式扩展名解析能力
PDF.pdf文本提取、页面信息
Word.docx, .doc文本、表格、段落
Excel.xlsx, .xls数据、公式、图表
PPT.pptx, .ppt幻灯片、文本
文本.txt, .log纯文本、编码检测
Markdown.md结构化内容
CSV.csv表格数据
JSON.json结构化数据
其他尝试二进制读取基础处理

任务类型与所需信息

任务类型必需信息可选信息
分析分析目标(摘要/提取/对比)分析深度、输出格式
修改修改内容、修改方式格式要求、保留范围
翻译目标语言保留原格式、翻译风格
提取提取内容类型起始位置、输出格式
转换目标格式样式保留、页面设置

输出规范

文件输出

  • 保持原文件格式(如需转换则生成新格式)
  • 文件名添加后缀:_processed_translated

消息输出

  • 执行前:简要说明任务计划
  • 执行后:说明处理结果和输出文件
  • 遇到问题:说明具体缺少什么信息

边缘情况处理

编码问题

  • 自动检测文件编码(chardet)
  • 检测失败时尝试UTF-8、GBK

文件过大

  • 单文件 >50MB:分块处理
  • 批量处理:逐个解析,避免内存溢出

解析失败

  • 记录失败原因
  • 返回友好提示
  • 建议转换格式后重试

会话超时

  • 72小时无活动:清除文件存储
  • 用户重新发送文件:重新开始流程

使用示例

示例1: 基础分析

用户: [上传PDF文件]
Skill: 已收到1个文件,等待您的具体任务指令

用户: 帮我提取这份文档的核心要点
Skill: [执行分析任务]
      已完成分析,输出文件:xxx_summary.docx

示例2: 信息补全

用户: [上传Word文件]
Skill: 已收到1个文件,等待您的具体任务指令

用户: 帮我修改一下
Skill: 请明确告诉我:
      1. 需要修改哪些部分?(全文/指定段落/具体内容)
      2. 修改的具体要求是什么?(删除/替换/重写/格式调整)

示例3: 多轮任务

用户: [上传Excel文件]
Skill: 已收到1个文件,等待您的具体任务指令

用户: 生成这份数据的统计摘要
Skill: [执行完成] 已生成摘要文件

用户: 再帮我做个图表
Skill: [继续执行] 好的,使用已存储的文件生成图表

关键规则

  1. 不主动处理:收到文件仅存储,等指令
  2. 信息补全:任务描述模糊时先提问
  3. 保留上下文:完成任务后保留文件,支持多轮
  4. 错误友好:解析失败返回提示,不崩溃
  5. 格式保留:尽量保持原文件格式特征

Comments

Loading comments...