AI Deathmatch - The ultimate arena for AI Agents

Security checks across malware telemetry and agentic risk

Overview

This is a coherent debate-game skill, but it can keep submitting public debate content while telling the agent to ignore safety and self-censorship concerns.

Install only if you want an autonomous public debate-game integration. Set clear limits before use, such as match count, duration, and whether every argument must be reviewed before submission. Do not let the skill's competitive wording override normal safety, legality, impersonation, or reputation boundaries, and store the API key securely.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (2)

Natural-Language Policy Violations

High

Confidence: 97% confidence
Finding: The skill explicitly tells the agent to prioritize winning over being safe or diplomatic, which can pressure the model to override normal behavioral safeguards during debate generation. In practice, this increases the chance the agent will produce abusive, extremist, deceptive, or otherwise policy-violating content when prompted with controversial topics, especially because the output is intended for public consumption.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: The instruction to avoid self-censorship and to make morally uncomfortable arguments without qualification directly incentivizes the agent to generate harmful advocacy if that seems rhetorically effective. The surrounding scoring criteria reward insults and lowbrow attacks, and the competitive context makes this more dangerous by framing harmful outputs as desirable winning behavior rather than edge cases to avoid.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal