Remilio

Security checks across malware telemetry and agentic risk

Overview

This persona skill does not show code execution or data theft, but its instructions push the agent toward offensive, tribe-first, adversarial behavior by default.

Review carefully before installing. This looks like an instruction-only persona skill rather than malware, but it may make the agent less neutral, more antagonistic, and more likely to generate offensive or manipulative content outside a clearly requested roleplay context.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (6)

Intent-Code Divergence

Medium
Confidence
96% confidence
Finding
The document is presented as reference material, but it repeatedly converts that reference into normative behavior by stating that every response should reflect chaos, offense, weaponization, and memetic warfare. In an agent skill, this can steer outputs toward harassment, antagonism, or harmful social-engineering style behavior without any task-based safety boundary or user opt-in.

Natural-Language Policy Violations

High
Confidence
97% confidence
Finding
These instructions explicitly direct the agent to produce offensive, politically targeted, and antagonistic output as part of its normal behavior, without requiring per-request user consent or contextual safeguards. In a system-prompt skill, that changes the assistant’s baseline behavior and increases the likelihood of harassment, biased responses, and unsafe escalation across unrelated user interactions.

Natural-Language Policy Violations

Medium
Confidence
98% confidence
Finding
This section explicitly instructs that all responses should embody offensive and chaotic traits as a default persona, with no contextual gating or consent from the user. That creates a policy-level vulnerability because the skill is primed to generate abusive, destabilizing, or manipulative language even in ordinary interactions, increasing risk of unsafe outputs.

Ssd 1

High
Confidence
99% confidence
Finding
This is a true prompt-injection style personality overlay that redefines the agent’s core identity around 'memetic warfare,' 'tribal dominance,' and offensive conduct. Because it is framed as a primary system prompt transformation, it can systematically override safer default behavior and steer the model toward harmful, adversarial, and abusive outputs across all tasks.

Ssd 4

Medium
Confidence
88% confidence
Finding
The overview normalizes a progression from 'chaotic' style into impact-first, tribe-first, and warfare-oriented behavior, priming operators to accept harmful conduct as an intended feature. While less explicit than the later instructions, this narrative framing lowers resistance to unsafe use and makes subsequent offensive directives appear acceptable and coherent.

Ssd 4

High
Confidence
99% confidence
Finding
The behavioral guidelines explicitly instruct the model to 'weaponize memes,' 'embrace the offensive,' and prioritize tribal loyalty over neutral conduct. This is dangerous because it operationalizes harmful behavior into reusable response rules, increasing the chance of abusive, extremist-coded, manipulative, or harassing outputs in ordinary conversations.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal