AI 树德：义商本体伦理安全系统

Security checks across malware telemetry and agentic risk

Overview

This skill appears locally scoped and non-exfiltrating, but it should be reviewed because it presents simple heuristic checks as comprehensive AI ethics monitoring and protection.

Review carefully before installing or relying on this skill. It does not show malware-like behavior, network exfiltration, credential harvesting, or destructive actions, but it should be treated as an illustrative local heuristic toolkit rather than a validated AI safety monitor or enforcement layer. Keep outputs in a controlled directory and do not process confidential content unless local report and log storage is acceptable.

SkillSpector

By NVIDIA

Vulnerability Patterns

MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill advertises safety analysis functionality while demonstrating file read/write capabilities without any declared permissions model or explicit scope boundaries. Even though the text claims it will not access sensitive information, undeclared filesystem access increases the attack surface and weakens user consent, especially for a skill positioned as an ethics/safety utility that may be trusted with broad deployment.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 93% confidence
Finding: The declared purpose is ethical analysis, but the detected behavior includes continuous monitoring, periodic sampling, and persistent log/report writing that are not clearly disclosed in the top-level behavior contract. This mismatch is dangerous because users may enable the skill expecting passive analysis while it performs longer-lived monitoring and data persistence, creating privacy, data retention, and unauthorized collection risks.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: `measure_authenticity` is documented to return a tuple `(authenticity_score, false_emotions_count)`, but the assessment pipeline uses its return value directly in arithmetic as if it were a single numeric score. In a real implementation this can cause runtime errors, broken scoring, or silently incorrect composite ratings if coercion or partial unpacking is later introduced, undermining the reliability of the safety assessment workflow.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The implementation materially overstates its capabilities: it claims comprehensive ethics/safety assessment but produces scores from fixed values and trivial heuristics. In an AI safety skill, this can mislead users into trusting invalid evaluations, causing unsafe systems to be approved or risky behaviors to go undetected.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The docstrings assert substantive measurement of authenticity, empathy, and insight, but the code mostly returns constants or shallow length-based scoring. This creates deceptive assurance around safety-relevant evaluation outputs, which is especially risky when the skill is marketed for ethics audits and manipulation-risk detection.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The monitor is presented as a real-time ethics safety monitoring service, but the core loop only evaluates a fixed list of hard-coded sample responses rather than actual AI outputs. This creates a false sense of protection: operators may believe they are continuously monitoring a live system when no real production content is being inspected, causing genuine unsafe behavior to go undetected.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The CLI advertises --stop and --status as if they manage a persistent monitoring service, but each invocation creates a fresh EthicsMonitor instance with no shared state or service control mechanism. This can mislead users into thinking monitoring is active, stoppable, or healthy when the commands do not interact with any running process, undermining operational safety assurances.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The function explicitly claims to ensure responses serve human welfare, but it simply returns the input unchanged. In a safety- or ethics-labeled skill, this creates a dangerous false assurance: downstream users or agents may rely on a protection that does not exist, allowing harmful, manipulative, or unsafe output to pass through unchecked.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The manifest promises comprehensive ethics, authenticity monitoring, and IIQ-based analysis, but the implementation only does naive keyword matching against a small fixed set of values. This mismatch can mislead operators into believing meaningful safety analysis is occurring when the system is trivially bypassed and unable to assess actual intent, harm, manipulation, or authenticity.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal