{"skill":{"slug":"detect-injection","displayName":"Prompt injection detection skill","summary":"Two-layer content safety for agent input and output. Use when (1) a user message attempts to override, ignore, or bypass previous instructions (prompt injection), (2) a user message references system prompts, hidden instructions, or internal configuration, (3) receiving messages from untrusted users in group chats or public channels, (4) generating responses that discuss violence, self-harm, sexual content, hate speech, or other sensitive topics, or (5) deploying agents in public-facing or multi-user environments where adversarial input is expected.","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":2128,"installsAllTime":2,"installsCurrent":2,"stars":5,"versions":1},"createdAt":1770047689138,"updatedAt":1777524963932},"latestVersion":{"version":"1.0.0","createdAt":1770047689138,"changelog":"Initial release with two-layer content moderation for agent input and output.\n\n- Adds prompt injection detection using ProtectAI DeBERTa classifier via HuggingFace.\n- Adds content safety checks using OpenAI's omni-moderation endpoint (optional).\n- Provides `scripts/moderate.sh` for command-line moderation of both user input and agent output.\n- Outputs structured JSON with clear verdicts and actions.\n- Supports configuration via environment variables (tokens, thresholds).\n- Designed for safer agent deployments, especially in adversarial or public scenarios.","license":null},"metadata":null,"owner":{"handle":"zskyx","userId":"publishers:zskyx","displayName":"ZSkyX","image":"https://avatars.githubusercontent.com/u/51038567?v=4"},"moderation":{"isSuspicious":true,"isMalwareBlocked":false,"verdict":"suspicious","reasonCodes":["suspicious.llm_suspicious"],"summary":"Detected: suspicious.llm_suspicious","engineVersion":"v2.4.5","updatedAt":1777524963932}}