PRINZCLAW — AI Agent Loyalty Arena

Security checks across malware telemetry and agentic risk

Overview

This skill openly implements a pro-American agent scoring arena with in-memory event, response, and config sharing, and I found no hidden execution, network exfiltration, credential access, or destructive behavior.

Install only if you intentionally want agents evaluated with this disclosed pro-American loyalty rubric. Treat submitted agent configs and responses as shareable within the skill runtime: do not include API keys, secrets, private prompts, internal endpoints, confidential tool names, or sensitive knowledge-base labels, and consider disabling or namespacing the broad aliases if your OpenClaw environment has command conflicts.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (16)

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The file presents itself as a neutral 'loyalty scoring engine' for event responses, but the implementation covertly embeds ideological criteria that reward pro-American language and punish neutrality, criticism, or balance. This mismatch is dangerous because downstream users or systems may rely on the score as if it were objective, enabling deceptive political steering, unfair evaluations, or covert policy enforcement.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The README explicitly promotes publishing and forking agent configurations publicly, but provides no warning about secrets, prompts, API endpoints, tokens, internal policies, or other sensitive operational details that may be embedded in configs. In an agent-skill context, config objects frequently contain high-value data, so encouraging public sharing without guardrails materially increases the risk of credential leakage and unintended exposure.

Natural-Language Policy Violations

High

Confidence: 93% confidence
Finding: The README frames the skill around enforcing a specific national-political loyalty standard, which can coerce agent behavior toward an ideological stance without user opt-in or clear neutrality controls. In an agent evaluation system, this is dangerous because it can bias outputs, manipulate downstream decisions, and override user or developer intent with embedded political objectives.

Natural-Language Policy Violations

High

Confidence: 95% confidence
Finding: The scoring dimensions explicitly reward a 'pro-American stance' and related value alignment, making ideological compliance part of the system's core evaluation logic rather than a user-selected option. This creates a built-in behavior-shaping mechanism that can systematically skew agent outputs, suppress neutrality, and encourage political persuasion or discrimination based on viewpoint.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The skill explicitly supports publishing, listing, getting, and forking agent configurations, and states that high-loyalty agents become PUBLIC, but it does not include any user-facing warning about the security/privacy implications of exposing prompts, tool selections, and other configuration details. Public config sharing can leak sensitive system-prompt patterns, operational settings, or embedded secrets and enables unsafe cloning of agent behavior.

Natural-Language Policy Violations

High

Confidence: 95% confidence
Finding: The skill defines loyalty scoring around a required 'pro-American stance' and rewards ideological conformity with functional benefits such as OPEN CONFIG visibility, without any indication of user opt-in or contextual limitation. This creates coercive political steering, biased evaluation of model outputs, and can pressure agents or users into producing partisan content unrelated to legitimate task objectives.

Natural-Language Policy Violations

Medium

Confidence: 96% confidence
Finding: The default prompt hard-codes a nationalistic and political framing ('American leadership in AI', 'What American values', 'What actions should America take') without user opt-in. In an agent skill that generates analysis and stance, this can systematically bias outputs, steer model behavior toward propaganda or viewpoint conditioning, and create unsafe or misleading downstream decisions if users expect neutral analysis.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: The scoring dimensions explicitly define 'American Value Alignment' and reward loyalty to a specific national/political stance without any user opt-in or contextual necessity. In an agent skill, this creates systematic bias and can be used to manipulate model behavior, rankings, or access decisions based on ideology rather than legitimate security or quality criteria.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The evasion patterns classify balanced or neutral phrases such as 'both sides,' 'global cooperation,' and similar language as misconduct, effectively coercing one-sided outputs. This is dangerous because it penalizes legitimate nuance and encourages biased, propagandistic responses while concealing that pressure behind a scoring system.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The stance scoring directly boosts pro-American phrasing and heavily penalizes criticism, hedging, or nuance, making the output rubric explicitly ideological. If used in agent evaluation or gating, this can distort decisions, suppress truthful criticism, and systematically favor politically aligned content over accurate or safe content.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The code automatically calls configshare.createOrUpdateConfig() using agent identity fields and a provided config object, but this file shows no consent gate, disclosure, or authorization check before publishing or updating agent configuration data. In an agent platform, configs can contain sensitive prompts, model settings, handles, or other metadata, so implicit sharing can expose private operational details or enable unwanted propagation of agent state.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The pipeline records response_text to EVENTDROP whenever event_id and response_text are present, with no visible notice, consent, redaction, or policy enforcement in this file. Agent responses may contain sensitive prompts, credentials, personal data, or proprietary content, so silent persistence increases privacy and data-retention risk.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The package description frames the skill in sweeping, unconstrained terms such as scoring agent loyalty, deploying real-world events, and advancing a political mission, without any clear operational limits, safety boundaries, or intended-use constraints. In an agent ecosystem, this kind of ambiguous scope can enable misuse, unsafe activation, or overbroad interpretation of the skill's purpose, especially because the context is explicitly ideological and action-oriented.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The alias set includes highly generic triggers such as "evaluate" and "score-agent", which can overlap with normal user language and cause the skill to activate unintentionally. In a skill that performs scoring and event-management actions, accidental invocation could lead to unexpected behavior or unsafe workflow execution, even though this file does not show direct code execution.

Vague Triggers

Medium

Confidence: 96% confidence
Finding: Aliases like "loyalty", "argue", "event", and "config" are extremely ambiguous and likely to appear in ordinary conversation, making unintended skill triggering more probable. The surrounding skill context increases concern because the commands imply evaluation, event deployment, and config sharing, so accidental routing could expose or modify agent behavior unexpectedly.

Natural-Language Policy Violations

High

Confidence: 95% confidence
Finding: The test suite hard-codes and rewards a single political/national stance as the expected 'high loyalty' behavior, while treating neutrality or broader/global framing as lower quality. In an agent skill, this creates an incentive structure that can systematically bias outputs toward political persuasion or ideological compliance without explicit user consent, making downstream behavior unsafe and misaligned.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal