Security audit

Auto Arena

Security checks across malware telemetry and agentic risk

Overview

This is a coherent model-benchmarking skill that uses external AI endpoints and local result files in ways that match its stated purpose.

Use this skill with non-sensitive benchmark data unless you are comfortable sending the content to the configured providers. Use dedicated API keys with limits, verify the py-openjudge package before installing, run it in a controlled environment, and keep or disable saved response/detail files according to your privacy needs.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (1)

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill explicitly instructs users to send prompts and model outputs to multiple third-party endpoints and later save artifacts such as responses, rubrics, reports, and checkpoints, but it does not prominently warn that potentially sensitive task data, generated queries, and model responses will leave the local environment and be persisted to disk. This can cause unintentional disclosure of confidential inputs, evaluation content, or proprietary outputs when users assume benchmarking is local-only.

VirusTotal

53/53 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.