Agency Agents Wrapper

Security checks across malware telemetry and agentic risk

Overview

This is mostly a prompt/persona library, but several included personas encourage high-impact actions like payments, production deployments, model routing, and persistent memory without consistently clear user-control boundaries.

Install only if you want a large persona library that may influence the agent toward operational workflows. Treat high-impact personas as advisory templates: require explicit approval before payments, deployments, production routing changes, blockchain transactions, external telemetry, or persistent memory use, and do not copy unsafe examples directly into production.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (75)

Context-Inappropriate Capability

Low

Confidence: 86% confidence
Finding: The skill includes explicit shell-style commands to read memory-bank files and list image directories, which grants the agent operational file-system discovery behavior beyond what is necessary for a visual storytelling role. Even though the commands appear aimed at gathering brand context, they normalize local file access and pattern-based searching, which could expose sensitive project data if the agent runs with broader workspace permissions.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The document explicitly requires accessible whimsy and reduced-motion support, yet the sample CSS and JavaScript implement multiple unconditional animations, motion effects, and celebratory overlays without any reduced-motion fallback or assistive-technology considerations. If copied into production, this can harm users with vestibular disorders or impair usability for assistive-tech users, making the guidance unsafe despite being framed as design advice.

Intent-Code Divergence

High

Confidence: 88% confidence
Finding: The document asserts that AI should only generate logic and never touch data directly, but the included workflow applies model-generated transformation code across dataframe values at scale. That mismatch is dangerous because it can mislead operators into trusting the system as non-mutating when it in fact performs automated production-adjacent data modification, increasing risk of corruption and unsafe deployment assumptions.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The code executes LLM-generated content via eval after only superficial substring checks. This is not a meaningful sandbox: attackers or model failures can craft lambda expressions that bypass the blacklist, invoke dangerous behavior through Python object introspection, or perform unintended computation when mapped over data, leading to code execution or severe data integrity compromise.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The skill promises an append-only, immutable Bronze layer, but the sample ingest code writes with `mergeSchema=true`, which permits silent schema evolution during append operations. In a data-engineering agent, users may copy this directly into production and assume the documented immutability/contract guarantees hold, increasing the risk of unreviewed schema drift, downstream breakage, and silent data corruption.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The workflow says Silver deduplication should use primary key plus event timestamp, but the implementation keeps the latest record by `_ingested_at` instead. In this skill context, that mismatch is risky because late-arriving or replayed events can overwrite the true latest business state, causing incorrect facts and aggregates that may silently propagate into Gold datasets and BI/ML consumers.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The STM32 code is explicitly labeled 'non-blocking' but implements two busy-wait loops that can block indefinitely until TXE/BSY flags change. In embedded firmware, this mismatch is dangerous because developers may call it from timing-sensitive paths or assume it is safe for cooperative scheduling, leading to CPU starvation, missed deadlines, watchdog resets, or deadlocks if the peripheral stalls.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The skill explicitly states gameplay-critical state must be server-owned, but the sample player controller assigns authority to each player's peer ID and gates state mutation on that authority. In practice, this can let a client become authoritative over its own movement or other replicated state, creating a cheat vector and undermining the claimed server-authoritative model.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill recommends exporting analytics to an external backend via HttpService:PostAsync() even though the role is framed as Roblox-native and mostly aimed at youth-oriented experiences. Sending gameplay or behavioral telemetry off-platform without strict data minimization, disclosure, consent, and secure handling expands the attack surface and can create privacy/compliance risk, especially for minors.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The skill explicitly bans artificial scarcity and pressure-based countdown timers, but later promotes countdown systems and FOMO-driven live-event rewards. That contradiction can steer an agent toward manipulative monetization patterns, particularly risky in a Roblox context where the audience is often children and teens.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The skill gives contradictory guidance about RemoteFunction behavior: it correctly warns against server-side InvokeClient(), but also claims InvokeServer() can cause the server thread to yield indefinitely if the client disconnects. In a security-focused Roblox skill, inaccurate trust-boundary and RPC semantics can mislead users into implementing the wrong safeguards, leaving real denial-of-service or reliability risks unaddressed.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The skill's own compliance section forbids absolute-price claims such as 'lowest price anywhere,' but later instructs hosts to tease 'the absolute lowest price of the entire stream.' That contradiction can cause users to adopt deceptive or noncompliant sales language, increasing platform enforcement, consumer deception, and regulatory risk in a high-pressure commerce context.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The document explicitly says cosmetics must not promise results, yet later provides coaching language encouraging 'I used it for two weeks and the bumps on my forehead went down by half' and before/after style claims. In a livestream sales setting, this can directly induce unlawful or platform-prohibited efficacy representations for regulated consumer products.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The instruction to join competitors' WeCom/community spaces can encourage deceptive or unauthorized intelligence gathering, potentially violating platform rules, terms of service, or fair-competition expectations. In a marketing-operations skill, this is more concerning because it operationalizes the behavior as a standard workflow step, increasing the chance an agent will carry it out without legal or ethical review.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The skill claims payments are never sent without proper verification, but multiple example workflows later execute transfers using recipient data from requests or scheduled bills without a consistent account/recipient verification step. In a payments agent, this mismatch is dangerous because implementers may copy the examples directly, leading to wrong-recipient payments, fraud, or business email compromise being converted into real fund movement.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The recurring bills workflow sends payments in a loop without performing the documented idempotency or duplicate-payment check first. If the schedule is replayed, a job retries after partial success, or an attacker resubmits the same bill, the agent could issue duplicate payments, causing direct financial loss and audit/control failures.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The skill states that every task must pass QA before advancement, but later instructs the orchestrator to mark tasks as blocked and continue the pipeline after repeated failures. This creates a policy contradiction that can let known-bad or unvalidated work progress into later phases, undermining the safety and quality gate the skill claims to enforce.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The skill explicitly biases the agent toward finding 3-5 issues and treating 'zero issues found' as suspicious regardless of evidence. This undermines integrity of QA output by encouraging fabricated or exaggerated defects, which can lead to false reports, bad decisions, and erosion of trust in testing results.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The report template operationalizes the bias by mandating a minimum number of issues and defaulting status to FAILED unless overwhelming evidence says otherwise. This creates systematic falsification pressure in agent outputs, making the skill more dangerous than mere tone guidance because it directly structures deceptive reporting behavior.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The skill description advertises very broad activation triggers such as generic requests to "switch" roles or use a domain-specific persona, without defining clear scope limits or safety boundaries. That can cause over-activation on ordinary tasks and make the assistant defer to persona-specific instructions from downstream agent files, increasing the chance of prompt-instruction conflicts, unsafe behavior carryover, or bypass of the base assistant policy through role framing.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The microcopy library suggests a whimsical destructive label, "Send to the digital void," but provides no guidance for confirmation, undo, or clear destructive-action signaling. In real interfaces, playful wording can obscure the seriousness of deletion and increase the risk of accidental destructive actions or user confusion.

Missing User Warnings

Medium

Confidence: 80% confidence
Finding: The skill promotes automatic large-scale AI-generated remediation and guarantees like zero data loss without prominently warning that dataset values will be modified in bulk based on model output. In this context, the omission increases operator overconfidence and makes unsafe use more likely, especially because the surrounding text emphasizes safety and determinism despite relying on generated code.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill explicitly advocates shadow-testing experimental models on real user data and autonomously promoting routing changes, but it does not require data minimization, consent, anonymization, or restrictions on sending sensitive content to additional third-party providers. That creates a real privacy and compliance risk because production user inputs may be replicated to more vendors than users or operators expect, expanding exposure and data-processing scope.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill explicitly encourages backup, disaster recovery, rollback, and self-healing automation but does not require confirmation gates, environment scoping, or warnings before actions that could affect live systems. In an agent setting, this can normalize autonomous changes to infrastructure or recovery state, increasing the chance of unintended production modifications.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The pipeline example contains concrete build, push, and kubectl deployment commands that could be copied or operationalized against real registries and Kubernetes clusters without any production-safety disclaimer. Because this skill is framed as an automation agent, the examples materially increase the risk of unintended deployments, image publication, or service cutovers in live environments.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal