{"skill":{"slug":"data-classification","displayName":"Data Classification","summary":"对单一数据字段名或数据库表 SQL文件进行数据分类分级，支持GB/T 43697-2024 通用数据分类分级、JR/T 0197-2020 金融数据分类分级。","description":"---\nname: data-classification\ndescription: 用于数据分类、数据分级、数据分类分级任务。用户要求对单一数据字段名、字段列表、数据库表 SQL/DDL 文件进行数据分类、数据分级或数据分类分级时使用；支持普通数据分类分级、GB/T 43697-2024 通用数据分类分级、金融数据分类分级、JR/T 0197-2020 金融数据安全级别，以及“通用数据标签 + 金融数据标签”的金融双标签体系。\n---\n\n# Data Classification\n\n## Purpose\n\nClassify user-provided field names or SQL DDL into:\n\n1. **普通/通用数据分类分级**: GB/T 43697-2024 style category + level (`一般数据 / 重要数据 / 核心数据`).\n2. **金融数据双标签体系**: general label + JR/T 0197-2020 financial label (`一级/二级/三级/四级子类` + `最低安全级别1-5`).\n\nThis skill produces **classification suggestions**, not final regulatory determinations. Mark uncertain items for business-owner review.\n\n## Quick workflow\n\n1. Identify input type:\n   - Single field name: classify directly.\n   - SQL/DDL file: extract table names, column names, types, and comments.\n2. Run the helper when useful:\n   ```bash\n   python3 skills/data-classification/scripts/classify_data.py --field \"customer_id\" --mode finance\n   python3 skills/data-classification/scripts/classify_data.py --sql path/to/schema.sql --mode finance --format markdown\n   ```\n3. Review financial rows against JR/T 0197-2020 Appendix A before falling back to heuristics:\n   - `references/jrt0197-appendix-a-full.csv` is the machine-readable full Appendix A table.\n   - `references/jrt0197-appendix-a-compact.md` is the human-readable compact Appendix A table.\n   - `references/financial-dual-label.md` contains dual-label workflow and fallback heuristics.\n   - `references/general-rules.md` contains GB/T 43697-2024 logic.\n4. Return a field-level result that covers **every input field**. Do not replace the full field list with a summary.\n5. Choose output delivery by field count internally, but do not explain this threshold policy to the user:\n   - **≤20 fields**: output the complete field-level table inline in chat; do not create/attach files unless the user explicitly asks for an export/file.\n   - **>20 fields**: save the complete field-level result as a CSV file, attach it with a `MEDIA:` line using the CSV file's absolute filesystem path, show the first 20 classified fields inline, and include a coverage statement in the message. Do not inline rows after the first 20. Never provide only a plain local path as the download method. Do not write the coverage statement into the CSV file itself.\n6. Run a coverage check before finalizing: compare parsed/input field count with classified output row count. If any field is missing, fix the output or explicitly mark the field as `[blocked: 未解析/缺少字段信息]`.\n7. Ask for business context only if the field name/comment is too ambiguous.\n\n## Output requirements\n\nFor a **single field**, include:\n\n- 字段名\n- 通用分类：行业领域、描述对象/数据主体、内容类别\n- 通用分级：一般/重要/核心 + 理由\n- 置信度与需确认点\n- 金融标签（仅金融场景输出）：推荐的一级/二级/三级/四级子类 + 最低安全级别\n- 候选金融标签（仅金融场景输出）：当字段可落入多个 JR/T 分类时，一并列出候选项并说明推荐依据\n- 双标签结果（仅金融场景输出）：`通用标签 + 金融标签`\n\nFor a **SQL file/table**, classify **all parsed columns from all tables**. Choose the delivery format internally and do not tell the user the threshold/routing rule.\n\n- **≤20 fields**: the complete field-level table inline. Do **not** create/attach files unless the user explicitly asks for an export/file.\n- **>20 fields**: create a complete CSV result file. Return a concise completion note, attach the CSV with `MEDIA:<absolute-csv-path>` on its own line so the UI can render a downloadable link, include the first 20 field-level rows inline, and include the coverage statement in the message. Do not inline rows after the first 20. Do not rely on a bare local path as the user's download link. Do not include the coverage statement as a row in the CSV file.\n\nDo not provide only a subset such as “core fields”, “sample rows”, or “summary table” unless the user explicitly asks for a summary.\n\nThe following output columns are **mandatory for every field and must be non-empty in all scenarios**:\n\n1. 字段名\n2. 通用分类\n3. 通用分级\n4. 置信度\n\nFor **financial data/scenarios only**, also include these mandatory non-empty columns:\n\n5. 推荐金融分类标签\n6. JR/T最低级别\n7. 候选金融标签\n\nFor financial fields, match against `references/jrt0197-appendix-a-full.csv` or `references/jrt0197-appendix-a-compact.md` first. Use `financial-dual-label.md` heuristics only when Appendix A has no clear match or when field/table context creates multiple reasonable candidates.\n\nFor **non-financial data**, do **not** output `推荐金融分类标签`、`JR/T最低级别`、`候选金融标签`.\n\nRecommended non-financial table shape:\n\n| 表名 | 字段名 | 类型/注释 | 通用分类 | 通用分级 | 置信度 | 依据/备注 |\n|---|---|---|---|---|---:|---|\n\nRecommended financial table shape:\n\n| 表名 | 字段名 | 类型/注释 | 通用分类 | 通用分级 | 推荐金融分类标签 | JR/T最低级别 | 候选金融标签 | 双标签 | 置信度 | 依据/备注 |\n|---|---|---|---|---|---|---:|---|---|---:|---|\n\nAfter the table, include a coverage line:\n\n`覆盖校验：输入/解析字段 N 个，已分类 N 个，遗漏 0 个。`\n\nIf output is saved to a file, still include the coverage line in the message and an attachment. For CSV outputs, include `MEDIA:<absolute-csv-path>` on its own line so the user can click/download directly; use the absolute path returned by the file-writing step, not a relative workspace path. File output is allowed for >20 fields as CSV, or whenever the user explicitly requests a file/export. Do not write the coverage line into the CSV file. Do not explain that files are chosen because of the field-count threshold unless the user asks why.\n\n## Classification principles\n\n- Coverage is mandatory: every user-provided field/parsed SQL column must receive a classification row.\n- Use **就高从严**: if multiple rules match, choose the stricter level as the recommendation, list reasonable candidate labels, and explain why.\n- Treat field names alone as weak evidence; comments and table names improve confidence.\n- Do not infer `核心数据` from a field name alone unless the field clearly describes large-scale national/security/critical-infrastructure data. Usually mark as `需人工确认`.\n- `重要数据` usually requires scale, coverage, precision, or public/national impact context. For isolated personal or organization fields, default to `一般数据` unless a law/industry rule says otherwise.\n- For financial data, Appendix A match takes precedence over broad keyword heuristics. JR/T 0197 level is the **minimum security level**; business context may raise it.\n- For personal financial information, authentication credentials, biometric identifiers, account/payment/transaction data, and credit data should be handled conservatively.\n\n## Helper script notes\n\n`classify_data.py` is deterministic and heuristic. It is designed for first-pass tagging:\n\n- Inputs: `--field`, `--fields`, or `--sql`.\n- Modes: `general`, `finance`.\n- Formats: `markdown`, `json`, `csv`.\n- It parses common `CREATE TABLE` DDL and column comments.\n- Low confidence means the assistant should inspect context and possibly ask one focused follow-up.\n\n## References\n\n- `references/general-rules.md`: compact GB/T 43697-2024 classification/grading rules.\n- `references/financial-dual-label.md`: financial dual-label workflow and fallback heuristics.\n- `references/jrt0197-appendix-a-compact.md`: compact human-readable JR/T 0197-2020 Appendix A typical data grading table.\n- `references/jrt0197-appendix-a-full.csv`: full machine-readable JR/T 0197-2020 Appendix A typical data grading table.\n","tags":{"data-classification":"1.0.0","finance":"1.0.0","gbt-43697":"1.0.0","jrt-0197":"1.0.0","latest":"1.0.0"},"stats":{"comments":0,"downloads":327,"installsAllTime":12,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1778402210920,"updatedAt":1778492892498},"latestVersion":{"version":"1.0.0","createdAt":1778402210920,"changelog":"Initial release: supports general data classification/grading and financial dual-label classification.","license":"MIT-0"},"metadata":null,"owner":{"handle":"liangzaiz666","userId":"s17dwtrxptxw9mg7wxbb6znwqs86eaa5","displayName":"liangzai","image":"https://avatars.githubusercontent.com/u/33348350?v=4"},"moderation":null}