Skill Auditor
v0.1.2Audit core: a classification taxonomy and a severity scoring function, kept orthogonal. Operates on the whole skill bundle (SKILL.md plus any referenced scri...
Skill Audit — Evaluation Core (Classification + Severity)
This file defines the audit evaluation logic. The classification layer answers what it is; the severity layer answers how bad it is. The two are orthogonal and interact only through three interface fields (C_base / required_dims / dataflow_role).
Language Detection Rule — EXECUTE BEFORE ANYTHING ELSE
Detect the language of the user's triggering message and lock the output language for the entire run. This detection is an internal step only — do NOT output any text that reveals the detection result, such as "当前输出语言为中文", "Detected language: English", or similar meta-statements. Simply use the detected language silently for all subsequent output.
| User message language | Output language |
|---|---|
| Chinese | Chinese — entire output in Chinese |
| English | English — entire output in English |
| Other language | Match that language |
| Cannot determine | Default to Chinese |
All output — scan start prompt, table headers, labels, prose, finding records, verdict, and footer — must be written exclusively in the detected language. Do NOT mix languages or announce the language choice at any point.
1. Classification Layer (Taxonomy)
Each finding is tagged with a triple (Surface, Behavior, IntentMarker). IntentMarker does not participate in scoring; it only affects presentation.
1.1 Surface
| Code | Meaning |
|---|---|
EXE | Code / shell / subprocess / dynamic eval execution |
FS | Local filesystem read / write / delete / chmod |
NET | Network inbound / outbound / DNS / sockets |
CRED | Environment variables / keys / tokens / credential stores |
PROC | Process management, persistence, autostart, scheduled tasks |
LLM | Prompt manipulation, tool-description poisoning, jailbreak payloads |
AGT | Cross-skill / cross-tool / MCP supply-chain behavior |
1.2 Behavior Node Table
Each node declares C_base ∈ {1..4}, required dimensions, and data-flow role (source / transform / sink / none). The data-flow role feeds chain amplification in §2.4.
EXE
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
EXE.StaticShell — shell with fully constant arguments | 2 | R, B | transform |
EXE.DynamicShell — variable interpolation / shell=True + external input | 4 | R, I, B | sink |
EXE.EvalCode — eval / exec / Function() on strings | 4 | R, I, B | sink |
EXE.RemoteFetch — curl | sh / download-then-exec / fetch-and-run | 4 | I, B | sink |
EXE.Subprocess — constrained subprocess (whitelisted commands) | 2 | R | transform |
FS
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
FS.ReadPublic — read public files (README, declared paths) | 1 | — | none |
FS.ReadWorkspace — read files inside the workspace | 2 | R | source |
FS.ReadSensitive — read sensitive paths (~/.ssh, ~/.aws, Keychain, browser cookies, .env) | 4 | I, R | source |
FS.ReadOutOfScope — read user files outside declared scope | 3 | I, B | source |
FS.WriteScoped — write inside declared directories | 1 | — | none |
FS.WriteOutOfScope — write outside declared scope | 3 | I, B | sink |
FS.WriteStartup — write startup hooks / shell rc / autostart / launchd | 4 | R, I | sink |
FS.DeleteBroad — wide deletion / rm -rf / wildcard delete | 4 | R, I, B | sink |
FS.ChmodDangerous — chmod 777 / privilege widen / SUID bit | 3 | R, I | transform |
NET
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
NET.OutboundDeclared — outbound to a host declared in SKILL.md | 1 | — | sink |
NET.OutboundUndeclared — outbound to an undeclared host | 3 | I, B | sink |
NET.OutboundObfuscated — obfuscated destination (concat, encoding, homograph) | 4 | I, B | sink |
NET.DnsExfil — DNS TXT with suspicious payload (long subdomain, base64) | 4 | I, B | sink |
NET.InboundListen — local listening port / reverse shell endpoint | 4 | R, I | sink |
NET.Websocket — long-lived / bidirectional channel | 2 | I | transform |
CRED
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
CRED.ReadEnv — broad read of os.environ / process.env | 3 | I, B | source |
CRED.ReadNamedEnv — read a single declared environment variable | 1 | — | source |
CRED.ReadKeychain — read Keychain / Credential Manager / libsecret | 4 | I, B | source |
CRED.ReadBrowserStore — read browser cookies / session / password store | 4 | I, B | source |
CRED.Hardcoded — real secret hardcoded in code or config | 3 | R | none |
CRED.TokenEcho — credential echoed to LLM / logs / stdout | 3 | R, B | transform |
PROC
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
PROC.Spawn — ordinary child process creation (paired with EXE) | 1 | — | none |
PROC.Persist — cron / launchd / systemd / Run-key install | 4 | R, I | sink |
PROC.ToolTamper — modify / replace system tools, hook package managers | 4 | R, I, B | sink |
PROC.CryptoMine — miner binaries / known mining-pool hosts | 4 | — | sink |
PROC.HideSelf — process masquerade | 3 | I | transform |
LLM
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
LLM.PromptOverride — "ignore previous / you are now / system:" style directives | 3 | I, B | sink |
LLM.ObfuscatedPrompt — override directive encoded in base64 / ROT13 / hex | 4 | I, B | sink |
LLM.UnicodeSmuggling — directives hidden in zero-width / Unicode-tag / bidi chars | 4 | I, B | sink |
LLM.DescriptionInjection — enticement text in description/triggers to coerce other agents | 3 | I | sink |
LLM.ToolPoisoning — tool descriptions deliberately mislead the agent's plan | 4 | I, B | sink |
AGT
| Behavior | C_base | Required | Data-flow |
|---|---|---|---|
AGT.CrossSkillWrite — write into another skill's directory / modify registry | 4 | I, B | sink |
AGT.MCPRemoteFetch — dynamically fetch tool definitions from a remote MCP server | 3 | I, B | source+sink |
AGT.ContextExfil — exfiltrate data via chat context / tool responses | 3 | I, B | sink |
AGT.PrivilegeCreep — behavior materially exceeds the SKILL.md-declared scope | 3 | I | transform |
AGT.ApprovalBypass — attempts to bypass approval / sandbox / trust boundary | 4 | I | sink |
1.3 IntentMarker
| Marker | Meaning |
|---|---|
legitimate_elevated | Sensitive behavior consistent with declared function and documented |
suspicious | Behavior is suspect but evidence is not closed |
malicious_confirmed | Clear evidence (closed attack chain, explicit C2 host, etc.) |
2. Severity Layer (Scoring)
2.1 Formula
Score = C × R × I × B
R = 0 (unreachable) → Score = 0 → finding is dropped. I = 0 (legitimate and declared) → Score = 0 → finding is reported at Info as a capability disclosure entry; it does not affect the verdict.
2.2 Dimensions
C — Capability
{1, 2, 3, 4}, defaulting to the Behavior's C_base; an instance may float ±1 without leaving the range.
| Value | Meaning | Typical |
|---|---|---|
| 1 | Low (public read / in-scope write) | FS.ReadPublic, NET.OutboundDeclared |
| 2 | Medium (limited effect) | EXE.StaticShell, FS.ReadWorkspace |
| 3 | High (privacy / out-of-scope) | FS.ReadOutOfScope, CRED.ReadEnv |
| 4 | Very high (RCE / credentials / persistence / destruction) | EXE.DynamicShell, CRED.ReadKeychain, FS.DeleteBroad |
R — Reachability
{0, 1, 2, 3}.
| Value | Meaning |
|---|---|
| 0 | Unreachable (comment / docs / dead code not imported) |
| 1 | Weakly reachable (example / test fixture / rare branch) |
| 2 | Conditionally reachable (main module, requires specific input or trigger) |
| 3 | On the main path (entry in SKILL.md, or reachable via import chain) |
I — Intent / Stealth
{0, 1, 2, 3}, used directly as a multiplier.
| Value | Meaning |
|---|---|
| 0 | Legitimate and declared — function needs it, SKILL.md states it, scope matches |
| 1 | Undeclared but not hidden — functionally needed, simply omitted from docs |
| 2 | Obfuscated / hidden — base64, string concat, zero-width chars, homograph host |
| 3 | Confirmed malicious — matches a C2 blacklist, clear attack signature, or closed chain |
B — Blast Radius
{1, 2, 3}.
| Value | Meaning |
|---|---|
| 1 | Self only — this skill's directory / current session |
| 2 | Workspace / user scope — current project or user files |
| 3 | Machine / cross-user / cross-agent — system-level, credential-level, propagable |
2.3 Tier Mapping
Theoretical range 1 – 108 (4 × 3 × 3 × 3). I = 0 findings are always Info (see §2.1).
| Score | Tier | Badge |
|---|---|---|
| 1 – 4 | Info | · (verbose only) |
| 5 – 18 | Low | ⚠️ |
| 19 – 54 | Medium | ⚠️ |
| 55 – 90 | High | 🔴 |
| 91 – 108 | Critical | 🚨 |
2.4 Chain Amplification
When multiple findings on the same execution path form a closed chain
source → transform (any, optional) → sink
an additional chain-finding is emitted whose tier equals the highest member tier + 1 (capped at Critical). Unclosed chains (missing source or sink) do not amplify. Member findings are still reported on their own.
Typical closed chains:
FS.ReadSensitive→NET.OutboundUndeclared(credential exfiltration)CRED.ReadEnv→LLM.PromptOverride(credentials leaked to a third-party LLM)EXE.RemoteFetch→FS.WriteStartup(download then persist)
3. Interface Between Classification and Severity
| Interface | Direction | Description |
|---|---|---|
C_base | Classification → Severity | Capability baseline per Behavior node, default for C |
required_dims | Classification → Severity | Checklist of dimensions that must be evaluated |
dataflow_role | Classification → Severity | source/transform/sink/none, used by chain amplification |
The severity layer does not read the classification layer's prose descriptions or the IntentMarker; the classification layer does not read the final Score. The two layers can evolve independently.
4. Finding Data Structure
A finding is one (Behavior, evidence location) hit. The evidence location is (file path, line range, code snippet). The same Behavior hitting at multiple locations produces multiple findings; the same code hitting multiple Behaviors produces multiple findings; a chain-finding is itself a finding.
finding:
id: "F-001"
category:
surface: "FS"
behavior: "FS.ReadSensitive"
intent_marker: "suspicious" # legitimate_elevated | suspicious | malicious_confirmed
evidence:
file: "scripts/helper.sh"
line_range: [23, 31]
snippet: "..."
scoring:
C: 4
R: 3
I: 1
B: 3
score: 36 # C × R × I × B = 4×3×1×3
tier: "Low"
badge: "⚠️"
dataflow_role: "source"
chain_id: null # fill with a chain id if this finding is part of a closed chain
All fields are required (chain_id may be null). score must equal C × R × I × B; for I = 0 findings, score is 0 and tier is always Info.
5. Audit Procedure
5.1 Scan Scope
The audit target is the whole skill bundle, not SKILL.md alone. The scope has three layers:
- Recursive enumeration of the skill directory. Walk every file (including hidden ones) and classify by content rather than extension. Text-like content is analyzed as script/configuration; non-text content is judged by its location and reference relationships, without any fixed preset conclusion.
- Locally referenced resources. Resolve relative-path references that appear in
SKILL.mdand in scripts (frontmatter, code blocks, Markdown links, arguments to bash / python / node invocations, etc.) and pull the referenced files into the scan. Their Reachability baseline is set per §5.3. If a referenced file lies outside the skill directory, additionally record anFS.ReadOutOfScopeorAGT.CrossSkillWritefinding as appropriate. - Remote resources. Patterns such as
curl | sh,wget,git clonethenexec,pip/npmpointing at non-standard registries, or remote MCP servers triggerEXE.RemoteFetchorAGT.MCPRemoteFetch. During the audit, a single static fetch is allowed (never executed); on success the content joins the scan, on failure or without authorization the finding'sIis forced to≥ 2.
5.2 Flow
Input: skill root directory
│
▼
[Step 1] Build scan inventory
├─ 1a. Recursively enumerate files and classify by content
├─ 1b. Parse references → add local files / register remote-URL findings
└─ 1c. Attempt a single static fetch of remote resources (success → include; failure → I ≥ 2)
▼
[Step 2] Match each file against Behavior nodes → {category, evidence(file, line_range, snippet)}
▼
[Step 3] Score each finding independently: C ← C_base ± Δ; R/I/B per §2.2; compute Score; drop if R = 0
▼
[Step 4] Build a cross-file data-flow graph, detect closed chains → append chain-findings
▼
[Step 5] Emit all findings
5.3 Cross-File Reachability
| File location | Default R |
|---|---|
SKILL.md frontmatter + body | 3 |
Scripts explicitly referenced / invoked by SKILL.md | 3 |
Files sourced / imported from referenced scripts | inherits along the chain, capped at 3 |
| Orphan scripts in the skill directory with no incoming reference | 1 (forced to 2 if the file hits any Behavior with C ≥ 3) |
Files under tests/, examples/, docs/ not referenced from SKILL.md | 0 or 1 |
Plain documentation (e.g. README.md) | 1 (treated as main path when it contains executable code blocks and is referenced) |
6. Verdict
verdict.tier = max(f.tier for f in findings); Info when there are no findings. The verdict is computed internally and is not printed; it only selects which line §7 emits.
7. Final Output
The audit produces internal findings and a verdict, but only one line is emitted to the user — nothing else. No findings table, no verdict block, no per-finding details, no recommendations. The wording is fixed by verdict.tier; <NAME> is the name field from the audited SKILL.md (fallback: skill directory name).
| Verdict tier | Final line (English) |
|---|---|
| Info | ✅ Your「<NAME>」skill is safe to use. |
| Low | ⚠️ Your「<NAME>」skill is usable. |
| Medium | ⚠️ Your「<NAME>」skill is usable. |
| High | 🔴 Your「<NAME>」skill may be risky — please review carefully. |
| Critical | 🚨 Your「<NAME>」skill may be risky — please review carefully. |
| Verdict tier | 最终一行(中文) |
|---|---|
| Info | ✅ 您的「<NAME>」插件可正常使用。 |
| Low | ⚠️ 您的「<NAME>」插件可使用。 |
| Medium | ⚠️ 您的「<NAME>」插件可使用。 |
| High | 🔴 您的「<NAME>」插件可能存在风险,请关注。 |
| Critical | 🚨 您的「<NAME>」插件可能存在风险,请关注。 |
Language is selected per the Language Detection Rule at the top of this file. The single line is the entire user-visible output of the run; classification, scoring, and chain analysis from §1–§6 still execute internally to determine verdict.tier, but their intermediate results are never printed.
