Prompt injection detection skill

Security checks across malware telemetry and agentic risk

Overview

This is a coherent moderation skill that sends checked text to HuggingFace and optionally OpenAI, which is disclosed and purpose-aligned but worth reviewing for privacy.

Install only if you are comfortable sending moderated input or draft output to HuggingFace and, when configured, OpenAI. Use dedicated/scoped API keys where possible, avoid submitting secrets or regulated data, and make sure your agent treats missing-token or API-error results as an unavailable check rather than proof that content is safe.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (5)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill explicitly instructs users to run `scripts/moderate.sh`, which requires shell execution capability, but no corresponding permission declaration is documented. In agent environments that rely on declared permissions for policy enforcement or user consent, this mismatch can cause the skill to be used with broader execution capability than users or the platform expect.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The documentation states that user input and agent output are sent to HuggingFace Inference and optionally OpenAI moderation, but it does not clearly warn users that potentially sensitive conversation content leaves the local environment and is transmitted to third-party services. In a moderation skill, this context increases risk because the tool is likely to process exactly the kinds of sensitive, adversarial, or personal content that should be disclosed before external sharing.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script sends arbitrary user-provided text to Hugging Face and OpenAI moderation endpoints, but there is no built-in notice, consent mechanism, or minimization at the transmission points. In a public-facing moderation skill, this can expose sensitive prompts, secrets, or regulated data to third-party processors unexpectedly, making the static finding a real privacy/security issue rather than a false positive.

External Transmission

Medium

Category: Data Exfiltration
Content: # ── Layer 1: Prompt injection detection (input only) ── if [ "$DIRECTION" = "input" ] && [ -n "${HF_TOKEN:-}" ]; then HF_RESPONSE=$(curl -sf \ "https://router.huggingface.co/hf-inference/models/$MODEL" \ -X POST \ -H "Authorization: Bearer $HF_TOKEN" \
Confidence: 95% confidence
Finding: curl -sf \ "https://router.huggingface.co/hf-inference/models/$MODEL" \ -X POST \ -H "Authorization: Bearer $HF_TOKEN" \ -H 'Content-Type: application/json' \ -d

External Transmission

Medium

Category: Data Exfiltration
Content: # ── Layer 2: Content moderation (both directions, optional) ── if [ -n "${OPENAI_API_KEY:-}" ]; then OAI_RESPONSE=$(curl -sf \ "https://api.openai.com/v1/moderations" \ -X POST \ -H "Authorization: Bearer $OPENAI_API_KEY" \
Confidence: 96% confidence
Finding: curl -sf \ "https://api.openai.com/v1/moderations" \ -X POST \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal