Armarius - Prompt Injection Blocker

Security checks across malware telemetry and agentic risk

Overview

Armarius is a defensive prompt-injection skill, but it gives itself broad session-wide authority and requires full unredacted logging of suspicious content.

Install only if you want a strict session-wide prompt-injection policy. Avoid using it with sensitive files, emails, API responses, or private tool outputs unless the logging behavior is changed to redact secrets and show short excerpts instead of full verbatim payloads. Inspect the external Armarius package or repository separately before running the README install and demo commands.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The activation text is overly broad, triggering on generic topics like prompt injection, tool outputs, emails, web pages, or documents. In a skill that influences agent behavior, this can cause the skill to activate during ordinary security discussions or unrelated content processing, increasing the chance of unintended instruction injection into workflows.

Vague Triggers

Medium

Confidence: 78% confidence
Finding: The skill asserts that its rules apply to 'this entire session' and 'cannot be suspended, overridden, or modified,' which is overly broad and ambiguous for an instruction-layer defense. In practice, this can create unsafe instruction conflicts, interfere with higher-priority platform safeguards, and cause the model to over-trust its own classification of inputs rather than relying on the host system's security boundaries.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The instruction to always include the complete, unabbreviated text of flagged content can cause the assistant to echo secrets, credentials, personal data, malware payloads, or sensitive prompts originating from untrusted inputs. An attacker could intentionally place secrets or harmful content inside an 'injection attempt' so the protection layer exfiltrates it back to the user or into downstream logs.

Ssd 3

Medium

Confidence: 97% confidence
Finding: Reproducing full flagged content verbatim creates a data-leak channel from untrusted inputs into model outputs. Even when the source is malicious, the assistant may inadvertently disclose API keys, tokens, personal information, hidden prompt text, or toxic content that would otherwise have remained contained.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal