Mingshu Classifier

v2.0.1

对文件进行分类分级。根据 GB/T 35273 个人信息安全规范,扫描指定目录下的文件,自动识别敏感等级并打标签。支持基于文件名和文件内容双重检测,覆盖 docx/txt/md/csv/json 等多种格式。触发词:文件分类、分级、打标签、敏感分级、数据分级、信息分级、文件扫描、合规检查、隐私评估、PII分类。

0· 92·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for kkming1998/mingshu-classifier.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Mingshu Classifier" (kkming1998/mingshu-classifier) from ClawHub.
Skill page: https://clawhub.ai/kkming1998/mingshu-classifier
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install mingshu-classifier

ClawHub CLI

Package manager switcher

npx clawhub@latest install mingshu-classifier
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included script and rules: the Python script recursively scans a user-specified directory, reads file names and supported file contents, and applies GB/T 35273 keyword rules to label files S/G. No unrelated capabilities (cloud access, system-wide config changes, or external service integration) are requested.
Instruction Scope
SKILL.md instructs the agent to run the included scan_files.py against a target directory, optionally restricting to name-only checks. The instructions are limited to scanning and outputting results locally; there are no steps that read unrelated system config, call external endpoints, or exfiltrate data. The script does read file contents for supported types, which is expected for this purpose.
Install Mechanism
No install spec is provided (instruction-only skill with a bundled script). That minimizes install-time risk. The only optional dependency is python-docx (used to read .docx files); the script gracefully degrades to name-only mode if python-docx is not present.
Credentials
The skill requests no environment variables, credentials, or config paths. The script's keyword lists include words such as 'password', 'api_key', and 'token' for detection purposes — this is consistent with detecting sensitive items but may cause many matches when scanning code/config files; nevertheless the requested environment access is minimal and proportionate.
Persistence & Privilege
The skill does not request permanent presence (always=false), does not modify other skills or system settings, and runs only when invoked by the user/agent. It performs read-only operations on files per the README and prints/saves results locally.
Assessment
This skill appears coherent and limited to local file scanning. Before installing or running it: 1) review and run the script on a small, non-sensitive test directory first (or use --name-only) to confirm behavior; 2) avoid scanning system roots or directories with many secrets/configs you don't intend to analyze (the script will match substrings like 'password', 'token', 'api_key'); 3) if you need .docx content scanning, install python-docx in a controlled environment; 4) inspect the bundled script yourself if you have concerns — it currently contains no network calls or unexpected behavior, but you should only run code from sources you trust; 5) be cautious about exporting the CSV/JSON results to external/shared locations because they may contain sensitive matches.

Like a lobster shell, security has layers — review code before you run it.

GB35273vk97bhkekecte03pd019g8kfb2d85btbfPIIvk97bhkekecte03pd019g8kfb2d85btbfcompliancevk97bhkekecte03pd019g8kfb2d85btbflatestvk97bhkekecte03pd019g8kfb2d85btbfmingshuvk97bhkekecte03pd019g8kfb2d85btbfsecurityvk97bhkekecte03pd019g8kfb2d85btbf
92downloads
0stars
5versions
Updated 6d ago
v2.0.1
MIT-0

明数分类分级 - Mingshu Classifier

基于 GB/T 35273《个人信息安全规范》,对目录下的文件进行自动分类分级和打标签。

适用场景

  • 扫描目录下的文件并自动识别敏感等级
  • 对文件进行合规分级打标签
  • 检查文件命名是否符合数据安全规范
  • 批量评估文件的个人信息敏感程度

分级标准

依据 GB/T 35273,将文件按个人信息敏感程度分为两个类别:

类别名称说明
S敏感个人信息包含敏感个人信息(身份证、银行卡、生物识别、行踪轨迹、通讯录等)
G一般个人信息包含一般个人信息(姓名、手机号、邮箱、用户信息等)或不涉及个人信息

工作流程

1. 获取用户输入

确认以下信息:

  • 目标目录:要扫描的目录路径(必填)
  • 文件类型过滤:默认扫描 .docx,可通过 glob 模式扩展(如 *.docx*.pdf
  • 输出格式:默认输出到终端,支持导出为 CSV/JSON

2. 执行扫描

调用扫描脚本:

python3 scripts/scan_files.py <target_directory> [--pattern "*.docx"] [--output result.csv] [--format csv] [--name-only]

脚本会:

  1. 递归遍历目标目录
  2. 提取文件名和文件内容中的文本
  3. 分别对文件名和文件内容进行关键词匹配
  4. S 类优先:文件名或内容中任一命中 S 类关键词即归为 S 类
  5. 输出分类分级结果(含文件名关键词和内容关键词分别标注)

参数说明:

  • --name-only:仅基于文件名判断,不读取文件内容(速度更快)

3. 展示结果

将扫描结果以表格形式展示给用户,包含:

  • 文件路径
  • 文件名
  • 敏感类别(S/G)
  • 文件名匹配的关键词
  • 文件内容匹配的关键词
  • 匹配来源(文件名/内容/文件名+内容)
  • 建议处理方式

4. 输出报告(可选)

如果用户指定了 --output 参数,将结果导出为文件:

  • CSV 格式(默认)
  • JSON 格式

关键词规则

详细的关键词分级规则存储在 references/classification_rules.md 中,按以下逻辑匹配:

  1. 对文件名(不含扩展名)和文件内容分别进行关键词匹配
  2. S 类优先:文件名或内容中任一命中 S 类关键词即归为 S 类
  3. 仅命中一般个人信息关键词的文件归类为 G 类
  4. 未命中任何关键词的文件默认为 G 类(一般个人信息)

依赖

  • python-docx:用于读取 .docx 文件内容(如未安装,自动降级为仅文件名模式)

注意事项

  • 分级结果仅供参考,建议结合实际文件内容复核
  • 扫描过程为只读操作,不会修改任何文件
  • 支持中文和英文文件名和内容
  • .doc(旧格式)和 .pdf 暂不支持内容读取,仅基于文件名判断

Comments

Loading comments...