Smart PR Review

Security checks across malware telemetry and agentic risk

Overview

This is a real PR review skill, but its optional webhook mode can automatically send repository diffs to Anthropic and post GitHub reviews using repository credentials.

Review carefully before deploying webhook mode. Use a dedicated least-privilege GitHub bot or GitHub App, always configure GITHUB_WEBHOOK_SECRET, restrict network exposure, avoid using it on repositories whose code cannot be sent to Anthropic, and consider changing the service to comment-only or requiring human approval before posting APPROVE or REQUEST_CHANGES.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (8)

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: A skill advertised as a manual, user-invocable reviewer also embeds instructions for an autonomous webhook server that reacts to GitHub events and publishes reviews. Mixing interactive and autonomous modes in one skill increases the chance that operators misunderstand its execution model and unintentionally enable continuous external processing of repository content.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The documented ability to run a server, consume webhooks, call external APIs, and publish GitHub review comments goes beyond what users reasonably expect from a prompt-driven code review skill. This expanded operational scope can enable unintended automated actions in repositories and broadens the attack surface through secrets, network exposure, and event-driven execution.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code reads an external provider credential and transmits PR title, body, and diff content to Anthropic, but the skill description does not disclose third-party code exfiltration. In a code-review context, diffs often contain proprietary source, secrets, or vulnerability details, so undisclosed external transmission is a real confidentiality risk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The webhook mode explicitly sends PR diffs and code to Anthropic, but the instructions do not prominently warn users that repository contents will be transmitted to a third-party external service. In many environments this can violate confidentiality expectations, internal policy, or regulatory requirements, especially for private repositories or sensitive codebases.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The deployment section instructs users to export GitHub tokens, webhook secrets, and API keys, but it does not include basic secure-handling guidance such as least privilege, secret managers, rotation, or avoiding shell history/process exposure. This increases the likelihood of credential leakage and downstream compromise of repositories or external AI accounts.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: At the transmission point, the service sends attacker-controlled and potentially sensitive repository content to Anthropic without a user-facing warning or consent checkpoint. This is dangerous because users may reasonably believe review happens locally or within GitHub, while confidential code and metadata are actually exported to a third party.

Ssd 1

High

Confidence: 98% confidence
Finding: The PR title, body, and diff are attacker-controlled content and are interpolated directly into the LLM prompt as instructions-bearing text. A malicious contributor can embed prompt-injection payloads in the PR description or code comments to manipulate the model into suppressing findings, changing verdicts, leaking hidden instructions, or producing misleading reviews.

Ssd 4

Medium

Confidence: 86% confidence
Finding: The design feeds multiple attacker-controlled chunks sequentially to the model and then merges outputs, which lets an attacker stage a multi-step manipulation narrative across chunks. Even if one chunk appears harmless, later chunks can reinforce injected instructions or bias the model's aggregate verdict, degrading review integrity.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal