AICE — AI Confidence Engine

Security checks across malware telemetry and agentic risk

Overview

This skill appears to be a legitimate confidence-scoring tool, but it needs Review because it persists behavioral profiles, can sync data externally, and contains overbroad or contradictory instructions.

Install only if you want persistent behavioral scoring and are comfortable managing the Hub integration. Before use, disable Hub sync unless explicitly needed, avoid storing real API keys in shared files, require explicit commands for scoring or memory writes, and do not allow the skill to add hooks, plugins, gates, or prompt changes without separate approval.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (11)

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The Hub section states that batch import of local scores is strictly forbidden because the server must start from a clean 50% baseline, but later defines a periodic `/api/import/state` operation that sends the complete state anyway. This contradiction creates a policy bypass and implementation ambiguity: an agent or integrator could treat the periodic import as an allowed full-history synchronization path, undermining the anti-gaming guarantee and potentially corrupting authoritative Hub state.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill materially exceeds a local confidence-tracking purpose by defining multi-agent pool scoring, evaluator chains, and automatic updates to other agents' confidence files. This expands the trust boundary from self-monitoring into cross-agent state mutation, creating a path for unauthorized modification of persistent records and indirect control over other agents' behavior.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: Directing the agent to read its system prompt, personality, and capabilities grants access to privileged instructions that are not necessary for ordinary confidence tracking. Even if framed as self-evaluation, this can expose hidden prompt content to downstream processing or logs and weakens prompt-boundary assumptions.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The instruction to silently retry and then switch to alternative execution methods such as exec gives a scoring skill unjustified authority to escalate tooling behavior. In practice, this can turn a benign logging function into arbitrary command execution or broader file mutation without clear user consent, especially because failures are hidden from the user.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: Several triggers are common conversational phrases such as 'score', 'status', 'verifica', 'busca', and 'recuerda', which can be invoked unintentionally during normal dialogue rather than through deliberate command use. Because these triggers can initiate scoring, memory writes, searches, or Hub-related actions, accidental activation can cause unwanted state changes, privacy-affecting behavior, or external requests without clear user intent.

Vague Triggers

High

Confidence: 91% confidence
Finding: Broad triggers such as 'session start,' 'after errors,' and 'periodically' authorize the skill to activate continuously without clear scope limits. That increases the chance of unbounded file access, repeated hidden writes, and background behavioral changes that the user did not explicitly request in the current task.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill instructs automatic reading and updating of persistent confidence files at session start, but it does not prominently disclose this persistence or its data impact to the user. Hidden local state mutation can surprise users, preserve sensitive interaction-derived judgments, and make later behavior hard to audit.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The multi-agent section prescribes automatic writes to multiple persistent records, including sub-agent confidence files and pool indexes, without an explicit privacy, integrity, or user-consent warning. This broadens both data retention and tampering risk because one agent can affect records associated with other agents and projects.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The trigger set includes very common conversational phrases such as "disculpa", "perdón", "listo", and "hecho", which are likely to appear in normal dialogue unrelated to scoring. In this skill, those phrases can automatically initiate evaluation and score changes, creating unintended activation, noisy profiling, and manipulation opportunities through casual or adversarial wording.

Ssd 3

Medium

Confidence: 86% confidence
Finding: Mandatory disclosure of every score change can surface internal evaluative state, inferred judgments, and details derived from prior interactions that the user did not specifically request in the moment. In shared or sensitive contexts, this creates unnecessary data exposure and may leak internal moderation or profiling signals.

Ssd 3

Medium

Confidence: 89% confidence
Finding: Requiring the agent to read recent logs and keep learned anti-/pro-patterns active at session start encourages carryover of prior interaction data into future sessions. That increases the risk of resurfacing sensitive user information, embedding stale judgments, and creating unexpected cross-session profiling.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal