Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

tender-similarity-analyzer

v2.1.0

提供本地多文档投标文件交叉查重,精确到段落,支持标题过滤、短段合并,输出可视化HTML相似度报告。

1· 94·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for wuliwenjing/tender-similarity-analyzer.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "tender-similarity-analyzer" (wuliwenjing/tender-similarity-analyzer) from ClawHub.
Skill page: https://clawhub.ai/wuliwenjing/tender-similarity-analyzer
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install wuliwenjing/tender-similarity-analyzer

ClawHub CLI

Package manager switcher

npx clawhub@latest install tender-similarity-analyzer
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description (local tender-document similarity, paragraph-level) align with the included engine modules (ngram, TF-IDF, SimHash), file extractors (pdf/docx/txt) and report generator. The presence of FormatPreservingEditor and EditHistory is plausible if the tool offers optional automated edits, but SKILL.md explicitly states edits are only suggested and that source files are not modified — the code provides write/save capabilities, which is an inconsistency.
!
Instruction Scope
SKILL.md repeatedly promises 'files are only read locally' and 'network isolation (requests/urllib/httpx disabled)'. However the repo contains check_dependencies.py which can run pip install via subprocess.run (downloads from PyPI) and recommends an AI-driven '安装依赖' auto-install option. The codebase also includes FormatPreservingEditor (replace_text_*/save) and EditHistory.save_to_file — functions that can modify or overwrite files. SKILL.md instructs running scripts/main.py; the main script was truncated in the review, so it's unclear whether the tool will default to safe read-only behavior or may perform writes or network activity depending on flags. This grants the agent several levers that contradict the textual guarantees.
Install Mechanism
There is no platform install spec, but check_dependencies.py includes an automatic pip-install pathway (subprocess.run calling python -m pip install ...). That is a standard mechanism but it will fetch packages from the network. requirements.txt contains heavyweight packages (sentence-transformers, scikit-learn, pdfplumber) that may trigger further model downloads (sentence-transformers may auto-download models at runtime). The code does not hard-code third-party download URLs, but auto-install + model loading imply network activity despite the SKILL.md network-isolation claim.
Credentials
The skill does not request environment variables, credentials, or config paths in the manifest. The requested Python dependencies are consistent with text extraction and ML-based similarity. No unexplained credentials or system-level paths are requested.
Persistence & Privilege
Skill is not marked always:true and does not request platform-level persistence. There is no evidence it alters other skills' configs. However it contains code able to write files (editor/save, edit history save), so granting the skill file-system access or allowing unreviewed runs could result in file modifications; this is a behavior-level risk, not a platform-privilege flag in the manifest.
What to consider before installing
Key things to consider before installing or running this skill: - Review scripts/main.py (not fully shown) to see whether default operation is read-only and whether any command-line flags enable writing back to documents or auto-installation. Do not run the skill on your real documents until you confirm behavior. - The repository contains an auto-install helper (scripts/check_dependencies.py) that runs pip via subprocess.run. Avoid using the skill's AI-driven '安装依赖' auto-install option; instead install dependencies yourself in a controlled virtualenv or container so you can review network activity and package versions. - Although SKILL.md promises network isolation, the codebase may still cause network activity indirectly (pip install, sentence-transformers model downloads). If you require strict offline operation, run the tool in an isolated environment with networking disabled and pre-install dependencies. - The code includes a FormatPreservingEditor with replace/save methods and EditHistory.save_to_file. Even if the default UI promises only suggestions, the code can modify and save documents. Run the tool on copies of documents and verify any 'apply changes' workflow requires explicit user confirmation. - Inspect scripts/security/network_isolator.py and sandbox.py to see how network isolation is implemented and whether it actually prevents subprocess pip installs or model downloads. If those modules are not robust, assume network calls may occur. - If you will allow the skill to run autonomously, reduce risk by restricting agent capabilities (disable auto-install, require user invocation for any file-write actions) and run in a sandbox/container. Additional useful checks that would increase confidence: a full review of scripts/main.py to confirm default modes, the contents of scripts/security/* to verify isolation, and runtime tests (in an isolated environment) to observe whether any outbound network traffic occurs during dependency installation or model loading.

Like a lobster shell, security has layers — review code before you run it.

documentvk970jfvjr89pr7d85pxq7je4ms841hqhlatestvk970jfvjr89pr7d85pxq7je4ms841hqhplagiarismvk970jfvjr89pr7d85pxq7je4ms841hqhtendervk970jfvjr89pr7d85pxq7je4ms841hqh
94downloads
1stars
3versions
Updated 3w ago
v2.1.0
MIT-0

tender-similarity-analyzer

投标文件查重与分析工具 — 本地多文档交叉查重,精确到段的重复检测。


🎯 功能定位

特性说明
🔍 核心功能多文档交叉查重,精确到段落
📊 输出报告美化HTML查重报告,含图表展示
✍️ 自动修改可选:自动生成修改建议(仅建议,不写入文件)
🔒 安全保证文件仅本地读取,网络隔离保护

🚀 快速开始

你: 查重 ~/Desktop/投标文件
AI: 扫描中... 找到 3 个文件
    执行查重... ✅ 完成!
    [发送HTML报告]

📖 使用指南

基本命令

命令格式:
  查重 <目录路径> [选项]

示例:
  查重 ~/Desktop/投标项目
  查重 /Users/xxx/Desktop/tender_files --recursive

可选参数

--include docx,pdf    指定文件格式(默认支持所有格式)
--output <路径>       指定报告输出位置

📊 报告说明

报告包含:

  • 总体统计 — 文件数、段落数、重复率
  • 核心指标卡 — 重复率进度条、段落对分布、相似度仪表盘
  • 状态判定 — 通过/警告/不合格
  • 重复详情 — 每个重复段落的原文对比
  • 修改建议 — AI生成的改写方向(仅建议,不写入文件)

判定标准

状态正文重复率说明
✅ 通过< 10%符合投标规范
⚠️ 警告10% ~ 30%建议检查
❌ 不合格≥ 30%需大幅修改

🔒 安全机制

✅ 默认安全保证

执行查重分析时:

  • ✅ 文件仅在本地读取,不上传任何内容
  • ✅ 网络隔离启用(禁用 requests/urllib/httpx)
  • ✅ 不修改任何源文件
  • ✅ 不发送任何外部请求

📦 依赖安装(首次使用)

首次运行时会提示缺少依赖,需手动安装:

pip install -r requirements.txt

这是标准Python实践,仅下载包,不涉及文件内容。


🔧 算法原理(v4 优化版)

三层漏斗模型

原始文本
  ↓
┌─────────────────────────────────────────┐
│ 第一层:段落分类 + 短段落合并            │
│   - 智能识别标题(章节编号/长度/格式)  │
│   - 过滤标题-vs-标题重复                │
│   - 短段落合并为完整语义单元            │
├─────────────────────────────────────────┤
│ 第二层:N-gram 快速初筛(3-gram)       │
│   - Jaccard相似度 ≥ 0.25 进入下一轮     │
├─────────────────────────────────────────┤
│ 第三层:TF-IDF 精确比对                 │
│   - 字符级(2-4) TF-IDF向量             │
│   - 动态阈值(0.30~0.50)自适应长度       │
└─────────────────────────────────────────┘

核心优化点

优化说明
标题过滤识别章节编号(第X章/1.1/X.X)、短标题
短段落合并<50字段落与上下文合并,避免碎片误判
动态阈值短文本用高阈值(0.50),长文本用低阈值(0.32)

📁 支持的文件格式

格式扩展名
Word文档.docx / .doc
PDF.pdf
纯文本.txt
Markdown.md

📝 命令行用法

cd ~/.openclaw/workspace/skills/tender-similarity-analyzer

# 首次使用先安装依赖
pip install -r requirements.txt

# 扫描本地目录
python scripts/main.py --dir ~/Desktop/投标文件

# 指定文件格式
python scripts/main.py --dir ./docs --include docx,pdf

# 指定输出报告
python scripts/main.py --dir ~/Desktop/tender -o my_report.html

⚙️ 依赖说明

必需依赖

依赖用途
python-docxWord文档读取
pdfplumberPDF文本提取
scikit-learnTF-IDF向量计算
Jinja2报告模板渲染

📌 常见问题

Q: 查重需要几个文件? A: 至少2个文件。

Q: 标题会被算入重复率吗? A: 不会。标题已被智能识别并过滤,只统计正文段落。

Q: 文件内容会离开本机吗? A: 不会。文件仅本地读取,网络隔离确保无外传。

Q: 报告保存在哪里? A: 默认保存在当前目录,可通过 -o 参数指定。


🔄 版本历史

版本日期更新
v1.02026-03-20初始版本
v1.12026-03-25标题/正文分类过滤
v1.22026-04-01短段落合并、动态阈值
v2.02026-04-01美化报告界面
v2.12026-04-01移除外部告警,改为手动安装依赖

版本: v2.1
日期: 2026-04-01

Comments

Loading comments...