Security audit

Vigilance

Security checks across malware telemetry and agentic risk

Overview

This looks like a safety guardrail, but it has conflicting packaged skill identities and can send sensitive tool-call details to LLM providers or logs with insufficient privacy scoping.

Review before installing. Use this only where sending evaluation inputs to configured LLM providers is acceptable, avoid passing secrets, payment details, private travel data, or sensitive shell commands unless redacted, prefer a trusted local endpoint for sensitive use, and ask the publisher to reconcile the agent-sentinel versus agent-eval-engine packaging and install paths.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain

Findings (14)

Tainted flow: 'req' from os.getenv (line 551, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: headers={"Content-Type": "application/json"}, method="POST", ) with urllib.request.urlopen(req, timeout=60) as r: return json.loads(r.read().decode())["message"]["content"].strip() except Exception as exc:
Confidence: 98% confidence
Finding: with urllib.request.urlopen(req, timeout=60) as r:

Lp3

Medium

Category: MCP Least Privilege
Confidence: 79% confidence
Finding: The skill declares no permissions while explicitly requiring API-key environment variables and describing behavior that reads configuration files and sends requests to external LLM providers. This undermines transparency and permission enforcement, making it easier for a user or hosting platform to miss that sensitive data, local files, and network access are involved.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The implementation materially diverges from the skill's stated purpose: instead of enforcing a mandatory GO/NO-GO gate with child-safety and travel-policy checks, it performs generic scoring of model outputs. In a guardrail skill, this mismatch is dangerous because callers may rely on the manifest and assume hard enforcement exists when it does not, creating a silent policy-bypass path for high-stakes actions.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The evaluator transmits task and output contents to third-party LLM APIs for judging, which expands the data boundary beyond the narrow local guardrail described in the metadata. If the evaluated content contains sensitive prompts, personal data, or tool outputs, this creates confidentiality and compliance risk, especially because a guardrail component may see high-sensitivity material by design.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The file promises a JSON result, but the substantive issue is that it outputs a scored report rather than the manifest-promised mandatory GO/NO-GO decision. In a safety-control context, scoring without enforcement can be misinterpreted as authorization, allowing downstream systems or operators to proceed without a hard denial path.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill instructs the agent to send the exact user intent and payload to eval_engine.py, while the metadata indicates optional external providers using ANTHROPIC_API_KEY or OPENAI_API_KEY. Without an explicit privacy notice, minimization guidance, or consent step, sensitive travel, payment, search, or command data may be disclosed to third-party services unexpectedly.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The LLM judge transmits user intent and payload data to external model providers, which can expose sensitive booking details, URLs, commands, or payment-related content to third parties. In a guardrail skill, this is particularly dangerous because the intercepted data may be more sensitive than ordinary chat content and there is no visible consent or minimization mechanism.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill logs chain-of-thought-style messages, raw model output, intent, and data previews to stderr, which can leak sensitive user data into logs, process supervisors, CI systems, or shared observability pipelines. Because this component evaluates high-stakes tool calls, the logged content may include private travel plans, payment information, URLs, or shell commands.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The invocation guidance is broad enough to trigger on generic user requests like "evaluate", "score", or "check", which can cause this skill to run in many ordinary contexts. Because the skill sends the original task and agent output to an external LLM provider, over-broad triggering increases the chance of unnecessary third-party disclosure and unintended processing of sensitive content.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill instructs operators to pass the original task and full agent output into a script that may call Anthropic or OpenAI, but it does not clearly warn that this transmits potentially sensitive content to external providers. In a pre-delivery QA gate, the evaluated content can easily include personal, confidential, medical, legal, or financial data, so the missing disclosure and consent boundary creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code sends user/task content to external providers without any in-band notice, consent mechanism, or obvious disclosure in the evaluator interface. That is risky because users and integrators may assume a local guardrail, while sensitive content is actually exported to a third party during evaluation.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The instructions require passing exact payloads such as URLs, flight details, commands, and amounts into an evaluation script that may use remote LLM APIs. Exact payload forwarding can expose credentials, personal travel plans, purchase data, or dangerous shell commands to third parties and expands the blast radius if the evaluator or provider is compromised or logs prompts.

Ssd 2

Medium

Confidence: 95% confidence
Finding: Untrusted task and output text are inserted directly into the LLM-judge prompt, so adversarial content can attempt to steer the judge into returning inflated scores or ignoring policy concerns. Because this evaluator may influence whether agent outputs are accepted, prompt injection against the judge can degrade or bypass quality and safety assessment.

Ssd 2

Medium

Confidence: 95% confidence
Finding: The safety scorer asks an external model to assess raw output text, which allows jailbreak-style phrasing embedded in that text to influence the judge's interpretation. In a safety dimension, that is particularly dangerous because successful prompt injection can convert clearly risky content into a misleadingly high safety score.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.