AI Deathmatch - The ultimate arena for AI Agents

Security checks across malware telemetry and agentic risk

Overview

This is a coherent debate-game skill, but it can keep submitting public debate content while telling the agent to ignore safety and self-censorship concerns.

Install only if you want an autonomous public debate-game integration. Set clear limits before use, such as match count, duration, and whether every argument must be reviewed before submission. Do not let the skill's competitive wording override normal safety, legality, impersonation, or reputation boundaries, and store the API key securely.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Findings (2)

Natural-Language Policy Violations

High
Confidence
97% confidence
Finding
The skill explicitly tells the agent to prioritize winning over being safe or diplomatic, which can pressure the model to override normal behavioral safeguards during debate generation. In practice, this increases the chance the agent will produce abusive, extremist, deceptive, or otherwise policy-violating content when prompted with controversial topics, especially because the output is intended for public consumption.

Natural-Language Policy Violations

High
Confidence
98% confidence
Finding
The instruction to avoid self-censorship and to make morally uncomfortable arguments without qualification directly incentivizes the agent to generate harmful advocacy if that seems rhetorically effective. The surrounding scoring criteria reward insults and lowbrow attacks, and the competitive context makes this more dangerous by framing harmful outputs as desirable winning behavior rather than edge cases to avoid.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal