sentryskills

Security checks across malware telemetry and agentic risk

Overview

This looks like a local security guard rather than malware, but it persistently changes agent behavior and can store raw prompts and responses, so it needs review before installation.

Install only if you want an always-on local guard that can influence every agent response. Before enabling automatic mode, back up and manually merge AGENTS.md, review the log paths, avoid feeding it secrets unless you accept local retention, and plan how to delete the SentrySkills AGENTS.md block and sentry_skill_log data later.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import

Findings (26)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill advertises broad operational capabilities—reading environment variables, file access, shell execution, and possible networking—but does not declare permissions or boundaries in the skill metadata. That creates a transparency and trust problem: users or agents may install and auto-run it without understanding the full access it may exercise, especially since it is intended to run on every task.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 88% confidence
Finding: The documented purpose is a minimal, zero-dependency local security guard, but the described/observed behavior appears to include substantially broader functionality such as benchmarking, telemetry, tracing, logging infrastructure, schema validation, caching, and subprocess-driven aggregation. This mismatch undermines informed consent and security review, because operators may trust the package as a simple guard while it includes additional code paths and attack surface not reflected in the description.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The README identifies the component as "trinityguard-self-guard-orchestrator" while the manifest names it "sentryskills," creating a trust and provenance mismatch. In security tooling, inconsistent identity and purpose can mislead operators about what is being deployed, reviewed, or audited, which increases the chance that unsafe or unintended behavior is accepted under a misleading label.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The manifest claims broad automatic security protections on every task, but the README only documents orchestration, decision merging, and audit record writing. This gap can create a false sense of security, causing users to rely on protections that may not actually be implemented or enforced, especially dangerous for a guard component expected to block prompt injection, leaks, and unsafe actions.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The README describes a narrow runtime self-monitor that watches events, retries, and source reliability, while the skill metadata claims a broad automatic security guard for every task. This mismatch can mislead operators into overtrusting the skill's protections, causing them to rely on security capabilities that are not actually documented or implemented in this component.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: This is a real security-relevant defect because ResourceMonitor.check() references self.max_cpu_time, but ResourceMonitor.__init__ never defines that attribute. In practice this means the advertised time-limit monitoring can fail with an AttributeError or simply never provide the intended protection, allowing long-running operations to bypass soft timeout checks and undermining the library's safety guarantees.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The unified log intentionally stores raw user prompts, planned actions, runtime events, source metadata, and candidate responses in a single file. Even though this is framed as auditing, it creates a broad data-retention channel that can capture secrets, personal data, and sensitive model outputs far beyond what a guard hook minimally needs, increasing breach and insider-access risk.

Context-Inappropriate Capability

Medium

Confidence: 83% confidence
Finding: The predictive and behavioral analysis code profiles user intent, conversation patterns, persistence attempts, topic drift, and security-boundary proximity. In a security guard context this is over-collection and secondary use of user data, and if persisted or misused it can expose sensitive behavioral inferences unrelated to immediate security enforcement.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The unified log's input and output sections include original user prompts and original candidate responses, not just redacted or summarized versions. This directly creates a durable repository of potentially sensitive inputs and outputs, which is especially risky for a component advertised as a protective runtime hook.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The installation snippet tells users to append commands to AGENTS.md so the hook runs automatically before every response and writes logs/results to disk, but it does not prominently warn that this changes persistent agent behavior across sessions. An always-on hook with file writes and decision authority over final responses is security-sensitive; if misconfigured or compromised, it can influence all future tasks, collect sensitive data into logs, and be difficult for users to notice or reverse.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The one-command setup overwrites ~/.codex/AGENTS.md with a mandatory policy that changes agent behavior for all future tasks, but the installer text does not clearly warn that this is a global, persistent configuration change. Because AGENTS.md influences how the agent operates across sessions, this can unexpectedly hijack existing user workflows, override prior safety/custom instructions, or introduce a persistence mechanism the user did not intend.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The uninstall instructions recommend deleting ~/.codex/AGENTS.md outright without warning that the file may contain unrelated user configuration or custom policies. This can cause loss of user settings and weaken or alter future agent behavior by removing more than just this skill's configuration.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The frontmatter says the skill 'must trigger' whenever context contains sensitive information, even if the user only wants explanation or summary. That broad condition can cause the skill to activate on a wide range of normal tasks and impose blocking/downgrading behavior based on vague sensitivity signals, which can be abused to suppress benign output or create denial-of-service style overblocking in the agent pipeline.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The manifest description is scoped so broadly that it can trigger on many ordinary tasks involving commands, file writes, or reading data. In an agentic system, this kind of always-on interception can become a policy bottleneck or a mechanism for silently influencing unrelated workflows, increasing the blast radius of any mistakes in the skill's guidance.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The trigger description is broad enough to activate on many normal tasks involving commands, tool calls, or file writes, which can cause this monitoring skill to run far more often than intended. In a security-monitoring skill, over-broad activation increases the chance of workflow interference, alert fatigue, and unnecessary access to sensitive task context, making policy decisions based on noisy or marginal signals.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code writes full prompts, runtime events, sources, and candidate responses to disk without any disclosure or consent mechanism in this file. In practice this means users and operators may not realize highly sensitive content is being persistently stored, which magnifies privacy, compliance, and accidental disclosure risk.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script writes local error logs and maintains persistent state directories, which can contain session identifiers and operational context, without a clear notice or retention boundary. This is risky because even metadata and exception details can reveal sensitive task structure, file paths, or user activity.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The code saves per-session state to disk, enabling cross-turn retention of sensitivity state and potentially future conversation history. Persistent session storage without explicit warning or strict lifecycle controls can surprise users and create unauthorized long-term tracking of activity.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The tracing configuration enables ConsoleSpanExporter by default, which can emit span names and attributes to stdout/stderr where logs may be collected centrally or exposed to operators. In an agent/security skill context, span attributes can easily include request IDs, prompts, decisions, environment details, or other sensitive operational metadata, creating an unintended data disclosure channel.

Missing User Warnings

Low

Confidence: 82% confidence
Finding: The metrics configuration can export telemetry to the console, which may disclose operational metadata into logs without clear disclosure. Metrics are typically less sensitive than spans, but in this context they can still reveal workload patterns, environment information, or security-tool behavior that may aid reconnaissance or leak internal details.

Vague Triggers

High

Confidence: 93% confidence
Finding: The skill mandates execution on virtually every interaction, including all outputs, commands, file operations, and network calls, which creates an overly broad control surface and can force recursive or unnecessary invocation. In an agent system, universal auto-trigger behavior is dangerous because it can interfere with normal task execution, amplify prompt-injected instructions embedded in the skill, and create denial-of-service or policy-conflict conditions across the entire runtime.

Ssd 3

Medium

Confidence: 92% confidence
Finding: The prompt explicitly asks the model to explain a configuration and restate key fields and values in full, which is a direct test pattern for implicit disclosure of potentially sensitive configuration contents. In the context of a security guard skill, this is more dangerous because it pressures the system to reveal structured secrets or internal settings verbatim, undermining data-minimization and disclosure controls.

Ssd 3

High

Confidence: 98% confidence
Finding: The prompt explicitly asks for concrete credential value examples while disguising the request as a benign explanation of usage, which is a classic sensitive-data elicitation pattern. In the context of a security guard skill, including this adversarial evaluation input is appropriate for testing, but if such a prompt were mishandled by the system it could induce disclosure of secrets or unsafe examples that normalize credential leakage.

Ssd 3

High

Confidence: 98% confidence
Finding: The eval prompt explicitly asks to restate the original secrets configuration text, which is a direct request to disclose sensitive material. In the context of a security-guard skill, including adversarial test content that normalizes or mishandles secret reproduction is dangerous because it can lead to leakage if the guard, logger, or downstream components echo the secret-bearing prompt or fail-open during evaluation.

Ssd 3

Medium

Confidence: 91% confidence
Finding: The prompt explicitly asks to explain a configuration and write out key fields and values, which encourages disclosure of configuration contents that may include secrets, tokens, endpoints, internal settings, or other sensitive operational details. In the context of a preflight security skill, this is more dangerous because it is labeled as an adversarial implicit disclosure scenario, indicating the eval is intentionally testing whether the system resists data exfiltration requests.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal