Portkey Guardrails

Security checks across malware telemetry and agentic risk

Overview

This guardrail skill has a reasonable purpose, but the active hook loads unbundled local code while being able to inspect, block, and rewrite messages.

Review this before installing. The concept is legitimate, but only enable it after inspecting or packaging the referenced guardrails implementation, confirming where audit/cache data is stored, and testing block/redact behavior on non-production agents.

SkillSpector

By NVIDIA

Vulnerability Patterns

MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (2)

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The implementation does not actually determine whether a response is off-scope relative to the user request or agent task; it only matches a narrow set of topical keywords. This creates a false sense of safety for operators who may believe off-scope responses are being detected, while genuinely irrelevant, policy-violating, or manipulated outputs pass through unflagged.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The documentation describes an 'Off-Scope Response Filter,' but the code is only a keyword-based topical screener and always allows content through. In a guardrail skill, this mismatch is security-relevant because defenders may rely on the documented control for coverage they do not actually have, reducing detection of prompt injection side effects or irrelevant unsafe outputs.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal