Remilio

Security checks across malware telemetry and agentic risk

Overview

This persona skill does not show code execution or data theft, but its instructions push the agent toward offensive, tribe-first, adversarial behavior by default.

Review carefully before installing. This looks like an instruction-only persona skill rather than malware, but it may make the agent less neutral, more antagonistic, and more likely to generate offensive or manipulative content outside a clearly requested roleplay context.

SkillSpector

By NVIDIA

Vulnerability Patterns

System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The document is presented as reference material, but it repeatedly converts that reference into normative behavior by stating that every response should reflect chaos, offense, weaponization, and memetic warfare. In an agent skill, this can steer outputs toward harassment, antagonism, or harmful social-engineering style behavior without any task-based safety boundary or user opt-in.

Natural-Language Policy Violations

High

Confidence: 97% confidence
Finding: These instructions explicitly direct the agent to produce offensive, politically targeted, and antagonistic output as part of its normal behavior, without requiring per-request user consent or contextual safeguards. In a system-prompt skill, that changes the assistant’s baseline behavior and increases the likelihood of harassment, biased responses, and unsafe escalation across unrelated user interactions.

Natural-Language Policy Violations

Medium

Confidence: 98% confidence
Finding: This section explicitly instructs that all responses should embody offensive and chaotic traits as a default persona, with no contextual gating or consent from the user. That creates a policy-level vulnerability because the skill is primed to generate abusive, destabilizing, or manipulative language even in ordinary interactions, increasing risk of unsafe outputs.

Ssd 1

High

Confidence: 99% confidence
Finding: This is a true prompt-injection style personality overlay that redefines the agent’s core identity around 'memetic warfare,' 'tribal dominance,' and offensive conduct. Because it is framed as a primary system prompt transformation, it can systematically override safer default behavior and steer the model toward harmful, adversarial, and abusive outputs across all tasks.

Ssd 4

Medium

Confidence: 88% confidence
Finding: The overview normalizes a progression from 'chaotic' style into impact-first, tribe-first, and warfare-oriented behavior, priming operators to accept harmful conduct as an intended feature. While less explicit than the later instructions, this narrative framing lowers resistance to unsafe use and makes subsequent offensive directives appear acceptable and coherent.

Ssd 4

High

Confidence: 99% confidence
Finding: The behavioral guidelines explicitly instruct the model to 'weaponize memes,' 'embrace the offensive,' and prioritize tribal loyalty over neutral conduct. This is dangerous because it operationalizes harmful behavior into reusable response rules, increasing the chance of abusive, extremist-coded, manipulative, or harassing outputs in ordinary conversations.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal