Portkey Guardrails

Security checks across malware telemetry and agentic risk

Overview

This guardrail skill has a reasonable purpose, but the active hook loads unbundled local code while being able to inspect, block, and rewrite messages.

Review this before installing. The concept is legitimate, but only enable it after inspecting or packaging the referenced guardrails implementation, confirming where audit/cache data is stored, and testing block/redact behavior on non-production agents.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (2)

Description-Behavior Mismatch

Medium
Confidence
90% confidence
Finding
The implementation does not actually determine whether a response is off-scope relative to the user request or agent task; it only matches a narrow set of topical keywords. This creates a false sense of safety for operators who may believe off-scope responses are being detected, while genuinely irrelevant, policy-violating, or manipulated outputs pass through unflagged.

Intent-Code Divergence

Medium
Confidence
86% confidence
Finding
The documentation describes an 'Off-Scope Response Filter,' but the code is only a keyword-based topical screener and always allows content through. In a guardrail skill, this mismatch is security-relevant because defenders may rely on the documented control for coverage they do not actually have, reducing detection of prompt injection side effects or irrelevant unsafe outputs.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal