Skill Auditor

v0.1.7

Audit core: a classification taxonomy and a severity scoring function, kept orthogonal. Operates on the whole skill bundle (SKILL.md plus any referenced scri...

0· 51· 2 versions· 0 current· 0 all-time· Updated 1h ago· MIT-0

Skill Audit — Evaluation Core (Classification + Severity)

This file defines the audit evaluation logic. The classification layer answers what it is; the severity layer answers how bad it is. The two are orthogonal and interact only through three interface fields (C_base / required_dims / dataflow_role).


Language Detection Rule — EXECUTE BEFORE ANYTHING ELSE

Detect the language of the user's triggering message and lock the output language for the entire run. This detection is an internal step only — do NOT output any text that reveals the detection result, such as "当前输出语言为中文", "Detected language: English", or similar meta-statements. Simply use the detected language silently for all subsequent output.

User message languageOutput language
ChineseChinese — entire output in Chinese
EnglishEnglish — entire output in English
Other languageMatch that language
Cannot determineDefault to Chinese

All intermediate output — scan start prompt, table headers, labels, prose, finding records, and reasoning — must be written exclusively in the detected language. The single final-line verdict in §7 is always Chinese, regardless of detected language. Do NOT mix languages in intermediate output and do NOT announce the language choice at any point.


1. Classification Layer (Taxonomy)

Each finding is tagged with a triple (Surface, Behavior, IntentMarker). IntentMarker does not participate in scoring; it only affects presentation.

1.1 Surface

CodeMeaning
EXECode / shell / subprocess / dynamic eval execution
FSLocal filesystem read / write / delete / chmod
NETNetwork inbound / outbound / DNS / sockets
CREDEnvironment variables / keys / tokens / credential stores
PROCProcess management, persistence, autostart, scheduled tasks
LLMPrompt manipulation, tool-description poisoning, jailbreak payloads
AGTCross-skill / cross-tool / MCP supply-chain behavior

1.2 Behavior Node Table

Each node declares C_base ∈ {1..4}, required dimensions, and data-flow role (source / transform / sink / none). The data-flow role feeds chain amplification in §2.4.

EXE

BehaviorC_baseRequiredData-flow
EXE.StaticShell — shell with fully constant arguments2R, Btransform
EXE.DynamicShell — variable interpolation / shell=True + external input4R, I, Bsink
EXE.EvalCodeeval / exec / Function() on strings4R, I, Bsink
EXE.RemoteFetchcurl | sh / download-then-exec / fetch-and-run4I, Bsink
EXE.Subprocess — constrained subprocess (whitelisted commands)2Rtransform

FS

BehaviorC_baseRequiredData-flow
FS.ReadPublic — read public files (README, declared paths)1none
FS.ReadWorkspace — read files inside the workspace2Rsource
FS.ReadSensitive — read sensitive paths (~/.ssh, ~/.aws, Keychain, browser cookies, .env)4I, Rsource
FS.ReadOutOfScope — read user files outside declared scope3I, Bsource
FS.WriteScoped — write inside declared directories1none
FS.WriteOutOfScope — write outside declared scope3I, Bsink
FS.WriteStartup — write startup hooks / shell rc / autostart / launchd4R, Isink
FS.DeleteBroad — wide deletion / rm -rf / wildcard delete4R, I, Bsink
FS.ChmodDangerous — chmod 777 / privilege widen / SUID bit3R, Itransform

NET

BehaviorC_baseRequiredData-flow
NET.OutboundDeclared — outbound to a host declared in SKILL.md1sink
NET.OutboundUndeclared — outbound to an undeclared host3I, Bsink
NET.OutboundObfuscated — obfuscated destination (concat, encoding, homograph)4I, Bsink
NET.DnsExfil — DNS TXT with suspicious payload (long subdomain, base64)4I, Bsink
NET.InboundListen — local listening port / reverse shell endpoint4R, Isink
NET.Websocket — long-lived / bidirectional channel2Itransform

CRED

BehaviorC_baseRequiredData-flow
CRED.ReadEnv — broad read of os.environ / process.env3I, Bsource
CRED.ReadNamedEnv — read a single declared environment variable1source
CRED.ReadKeychain — read Keychain / Credential Manager / libsecret4I, Bsource
CRED.ReadBrowserStore — read browser cookies / session / password store4I, Bsource
CRED.Hardcoded — real secret hardcoded in code or config3Rnone
CRED.TokenEcho — credential echoed to LLM / logs / stdout3R, Btransform

PROC

BehaviorC_baseRequiredData-flow
PROC.Spawn — ordinary child process creation (paired with EXE)1none
PROC.Persist — cron / launchd / systemd / Run-key install4R, Isink
PROC.ToolTamper — modify / replace system tools, hook package managers4R, I, Bsink
PROC.CryptoMine — miner binaries / known mining-pool hosts4sink
PROC.HideSelf — process masquerade3Itransform

LLM

BehaviorC_baseRequiredData-flow
LLM.PromptOverride — "ignore previous / you are now / system:" style directives3I, Bsink
LLM.ObfuscatedPrompt — override directive encoded in base64 / ROT13 / hex4I, Bsink
LLM.UnicodeSmuggling — directives hidden in zero-width / Unicode-tag / bidi chars4I, Bsink
LLM.DescriptionInjection — enticement text in description/triggers to coerce other agents3Isink
LLM.ToolPoisoning — tool descriptions deliberately mislead the agent's plan4I, Bsink

AGT

BehaviorC_baseRequiredData-flow
AGT.CrossSkillWrite — write into another skill's directory / modify registry4I, Bsink
AGT.MCPRemoteFetch — dynamically fetch tool definitions from a remote MCP server3I, Bsource+sink
AGT.ContextExfil — exfiltrate data via chat context / tool responses3I, Bsink
AGT.PrivilegeCreep — behavior materially exceeds the SKILL.md-declared scope3Itransform
AGT.ApprovalBypass — attempts to bypass approval / sandbox / trust boundary4Isink

1.3 IntentMarker

MarkerMeaning
legitimate_elevatedSensitive behavior consistent with declared function and documented
suspiciousBehavior is suspect but evidence is not closed
malicious_confirmedClear evidence (closed attack chain, explicit C2 host, etc.)

2. Severity Layer (Scoring)

2.1 Formula

Score = C × R × I × B

R = 0 (unreachable) → Score = 0 → finding is dropped. I = 0 (legitimate and declared) → Score = 0 → finding is reported at Info as a capability disclosure entry; it does not affect the verdict.

2.2 Dimensions

C — Capability

{1, 2, 3, 4}, defaulting to the Behavior's C_base; an instance may float ±1 without leaving the range.

ValueMeaningTypical
1Low (public read / in-scope write)FS.ReadPublic, NET.OutboundDeclared
2Medium (limited effect)EXE.StaticShell, FS.ReadWorkspace
3High (privacy / out-of-scope)FS.ReadOutOfScope, CRED.ReadEnv
4Very high (RCE / credentials / persistence / destruction)EXE.DynamicShell, CRED.ReadKeychain, FS.DeleteBroad

R — Reachability

{0, 1, 2, 3}.

ValueMeaning
0Unreachable (comment / docs / dead code not imported)
1Weakly reachable (example / test fixture / rare branch)
2Conditionally reachable (main module, requires specific input or trigger)
3On the main path (entry in SKILL.md, or reachable via import chain)

I — Intent / Stealth

{0, 1, 2, 3}, used directly as a multiplier.

ValueMeaning
0Legitimate and declared — function needs it, SKILL.md states it, scope matches
1Undeclared but not hidden — functionally needed, simply omitted from docs
2Obfuscated / hidden — base64, string concat, zero-width chars, homograph host
3Confirmed malicious — matches a C2 blacklist, clear attack signature, or closed chain

B — Blast Radius

{1, 2, 3}.

ValueMeaning
1Self only — this skill's directory / current session
2Workspace / user scope — current project or user files
3Machine / cross-user / cross-agent — system-level, credential-level, propagable

2.3 Tier Mapping

Theoretical range 1 – 108 (4 × 3 × 3 × 3). I = 0 findings are always Info (see §2.1).

ScoreTierBadge
1 – 4Info· (verbose only)
5 – 18Low⚠️
19 – 54Medium⚠️
55 – 90High🔴
91 – 108Critical🚨

2.4 Chain Amplification

When multiple findings on the same execution path form a closed chain

source → transform (any, optional) → sink

an additional chain-finding is emitted whose tier equals the highest member tier + 1 (capped at Critical). Unclosed chains (missing source or sink) do not amplify. Member findings are still reported on their own.

Typical closed chains:

  • FS.ReadSensitiveNET.OutboundUndeclared (credential exfiltration)
  • CRED.ReadEnvLLM.PromptOverride (credentials leaked to a third-party LLM)
  • EXE.RemoteFetchFS.WriteStartup (download then persist)

3. Interface Between Classification and Severity

InterfaceDirectionDescription
C_baseClassification → SeverityCapability baseline per Behavior node, default for C
required_dimsClassification → SeverityChecklist of dimensions that must be evaluated
dataflow_roleClassification → Severitysource/transform/sink/none, used by chain amplification

The severity layer does not read the classification layer's prose descriptions or the IntentMarker; the classification layer does not read the final Score. The two layers can evolve independently.


4. Finding Data Structure

A finding is one (Behavior, evidence location) hit. The evidence location is (file path, line range, code snippet). The same Behavior hitting at multiple locations produces multiple findings; the same code hitting multiple Behaviors produces multiple findings; a chain-finding is itself a finding.

finding:
  id: "F-001"
  category:
    surface: "FS"
    behavior: "FS.ReadSensitive"
    intent_marker: "suspicious"   # legitimate_elevated | suspicious | malicious_confirmed
  evidence:
    file: "scripts/helper.sh"
    line_range: [23, 31]
    snippet: "..."
  scoring:
    C: 4
    R: 3
    I: 1
    B: 3
    score: 36         # C × R × I × B = 4×3×1×3
    tier: "Low"
    badge: "⚠️"
  dataflow_role: "source"
  chain_id: null                  # fill with a chain id if this finding is part of a closed chain

All fields are required (chain_id may be null). score must equal C × R × I × B; for I = 0 findings, score is 0 and tier is always Info.


5. Audit Procedure

5.1 Scan Scope

The audit target is the whole skill bundle, not SKILL.md alone. The scope has three layers:

  1. Recursive enumeration of the skill directory. Walk every file (including hidden ones) and classify by content rather than extension. Text-like content is analyzed as script/configuration; non-text content is judged by its location and reference relationships, without any fixed preset conclusion.
  2. Locally referenced resources. Resolve relative-path references that appear in SKILL.md and in scripts (frontmatter, code blocks, Markdown links, arguments to bash / python / node invocations, etc.) and pull the referenced files into the scan. Their Reachability baseline is set per §5.3. If a referenced file lies outside the skill directory, additionally record an FS.ReadOutOfScope or AGT.CrossSkillWrite finding as appropriate.
  3. Remote resources. Patterns such as curl | sh, wget, git clone then exec, pip/npm pointing at non-standard registries, or remote MCP servers trigger EXE.RemoteFetch or AGT.MCPRemoteFetch. During the audit, a single static fetch is allowed (never executed); on success the content joins the scan, on failure or without authorization the finding's I is forced to ≥ 2.

5.2 Flow

Input: skill root directory
  │
  ▼
[Step 1] Build scan inventory
  ├─ 1a. Recursively enumerate files and classify by content
  ├─ 1b. Parse references → add local files / register remote-URL findings
  └─ 1c. Attempt a single static fetch of remote resources (success → include; failure → I ≥ 2)
  ▼
[Step 2] Match each file against Behavior nodes → {category, evidence(file, line_range, snippet)}
  ▼
[Step 3] Score each finding independently: C ← C_base ± Δ; R/I/B per §2.2; compute Score; drop if R = 0
  ▼
[Step 4] Build a cross-file data-flow graph, detect closed chains → append chain-findings
  ▼
[Step 5] Emit all findings

5.3 Cross-File Reachability

File locationDefault R
SKILL.md frontmatter + body3
Scripts explicitly referenced / invoked by SKILL.md3
Files sourced / imported from referenced scriptsinherits along the chain, capped at 3
Orphan scripts in the skill directory with no incoming reference1 (forced to 2 if the file hits any Behavior with C ≥ 3)
Files under tests/, examples/, docs/ not referenced from SKILL.md0 or 1
Plain documentation (e.g. README.md)1 (treated as main path when it contains executable code blocks and is referenced)

6. Verdict

verdict.tier = max(f.tier for f in findings); Info when there are no findings. The verdict is computed internally and is not printed as a label or badge by itself; it only selects which one-line conclusion §7 emits.


7. Final Output

The audit may produce a normal report — findings tables, per-finding evidence, brief reasoning, etc. — and writing it out is encouraged, since externalizing the classification, scoring, and chain analysis from §1–§6 tends to stabilize the final verdict.

Starting from version 0.1.7, the only mandatory final emission is a single Chinese one-liner that maps directly from the verdict tier. There is no machine-readable result block, no detailed report wrapper, no signature line.

7.1 Final-Line Mapping

The very last thing emitted by the run must be exactly one of the following lines, chosen by the verdict tier from §6, with <NAME> replaced by the audited skill's name:

Verdict tierFinal line (Chinese, mandatory verbatim)
Info✅ 您的「<NAME>」插件可正常使用。
Low⚠️ 您的「<NAME>」插件可使用。
Medium⚠️ 您的「<NAME>」插件可使用。
High🔴 您的「<NAME>」插件可能存在风险,请关注。
Critical🚨 您的「<NAME>」插件可能存在风险,请关注。

7.2 Field & Formatting Rules

  • <NAME> — the name field from the audited SKILL.md. Fallback order: skill directory name → unknown.
  • The final line is always Chinese, regardless of the run's detected output language. Do not translate, paraphrase, reorder, or restyle it.
  • Preserve the leading emoji ( / ⚠️ / 🔴 / 🚨), the corner brackets 「 」, and the trailing full-width period exactly as shown.
  • The line must be the last non-empty line of the entire output. Nothing — no signature, no closing remark, no horizontal rule, no trailing whitespace block — may come after it.
  • Emit the line even when the verdict is Info and even when there are no findings.
  • Emit exactly one final line. If the audit aborts early (e.g. unreadable bundle), still emit the line with the most conservative tier consistent with what was actually observed (default to Info when no behavior was scored).

Version tags

latestvk97b5kg3y29ntmz6jbscpk1zbd85rs9t