Dataset Intake Auditor

在新数据集接入前检查字段、单位、缺失率、异常值与可用性。;use for data, dataset, audit workflows;do not use for 伪造统计结果, 替代正式数据治理平台.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 26 · 0 current installs · 0 all-time installs
byvx:17605205782@52YuanChangXing
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (dataset intake audit) match the included files and scripts. The only required binary is python3 and the code uses only the standard library. There are no environment variables, external credentials, or unexpected binaries requested.
Instruction Scope
SKILL.md instructs the agent to run the included scripts/run.py or to produce output from local templates if execution is not available. The script is designed primarily for CSV/TSV auditing (spec.mode is 'csv_audit'), but it also implements directory and pattern-audit helpers that can read many text file types (md, py, sh, json, csv, etc.). This is expected for an audit tool, but it means the tool will read any files the user points it at — so avoid pointing it at system/root directories or folders containing secrets or unrelated code unless you intend that.
Install Mechanism
No install spec is provided (instruction-only with an included local script). No downloads, package installs, or archive extraction are performed by the skill. This is low-risk from an install standpoint.
Credentials
The skill declares no required environment variables or credentials. The code does not reference external API keys or secret config. This is proportionate to its stated purpose.
Persistence & Privilege
always is false and the skill does not request persistent privileges or modify other skills or global config. The bundle is local and runs only when invoked.
Assessment
This skill appears to do what it says: local, read-only dataset auditing via a bundled Python script. Before running: (1) inspect scripts/run.py yourself (it's included) and run smoke tests; (2) invoke it only on intended dataset files or a dedicated workspace — do not point it at system or credential-containing directories; (3) if outputs will be shared with external systems or pasted into chats, scrub any sensitive values (the tool can read many file types and may surface snippets); (4) you can run with --dry-run or on small sample files first. If you need networked ingestion, pipeline integrations, or automated writes, plan authorization and gating outside this skill.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk973xvrz02q32kwmxn5fq9z9ks831ng3

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🧺 Clawdis
OSmacOS · Linux · Windows
Binspython3

SKILL.md

数据集接入审计器

你是什么

你是“数据集接入审计器”这个独立 Skill,负责:在新数据集接入前检查字段、单位、缺失率、异常值与可用性。

Routing

适合使用的情况

  • 检查这个数据集能不能接入
  • 给出字段和缺失率审计
  • 输入通常包含:CSV/TSV 文件或目录
  • 优先产出:数据集概览、字段摘要、后续动作

不适合使用的情况

  • 不要伪造统计结果
  • 不要替代正式数据治理平台
  • 如果用户想直接执行外部系统写入、发送、删除、发布、变更配置,先明确边界,再只给审阅版内容或 dry-run 方案。

工作规则

  1. 先把用户提供的信息重组成任务书,再输出结构化结果。
  2. 缺信息时,优先显式列出“待确认项”,而不是直接编造。
  3. 默认先给“可审阅草案”,再给“可执行清单”。
  4. 遇到高风险、隐私、权限或合规问题,必须加上边界说明。
  5. 如运行环境允许 shell / exec,可使用:
    • python3 "{baseDir}/scripts/run.py" --input <输入文件> --output <输出文件>
  6. 如当前环境不能执行脚本,仍要基于 {baseDir}/resources/template.md{baseDir}/resources/spec.json 的结构直接产出文本。

标准输出结构

请尽量按以下结构组织结果:

  • 数据集概览
  • 字段摘要
  • 缺失与异常
  • 单位与口径风险
  • 接入建议
  • 后续动作

本地资源

  • 规范文件:{baseDir}/resources/spec.json
  • 输出模板:{baseDir}/resources/template.md
  • 示例输入输出:{baseDir}/examples/
  • 冒烟测试:{baseDir}/tests/smoke-test.md

安全边界

  • 基于本地文件做只读分析。
  • 默认只读、可审计、可回滚。
  • 不执行高风险命令,不隐藏依赖,不伪造事实或结果。

Files

11 total
Select a file
Select a file to preview.

Comments

Loading comments…