Security audit

Inbox Triage

Security checks across malware telemetry and agentic risk

Overview

This inbox triage skill is purpose-aligned, but it asks agents to gather private messages from multiple platforms with weak scoping and little privacy guidance.

Install only if you are comfortable giving the agent access to the message sources you explicitly choose. Use exported message files when possible, keep generated digests in private locations, review all drafts before sending, and tune the classification rules before relying on them for important messages.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (7)

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The documentation promises spam detection behavior based on broad keyword lists and examples, but the scoring code only matches a narrow regex set and even appears to miss expected terms like 'discount' due to the pattern used. This creates a specification/implementation gap that can cause misclassification of unwanted or potentially risky messages, reducing trust in downstream automation.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The urgent-classification section describes multiple signals such as direct questions, known contacts, time anchors, and business-hour context, but the implementation only scores a small subset of urgent keywords. In a priority-routing skill, this mismatch can cause genuinely time-sensitive messages to be deprioritized or mishandled by agents relying on the documented algorithm.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The normal-classification documentation lists semantic conditions like updates, flexible timing, routine communications, and follow-ups, but the code uses only a generic length-based heuristic. This can systematically classify messages as normal for the wrong reasons, masking urgent or spam signals and making behavior unpredictable in production workflows.

Intent-Code Divergence

Low

Confidence: 87% confidence
Finding: The document states that corrections are tracked to improve accuracy through incremental learning, but no learning logic is present. While not directly exploitable in the classic sense, this is still a security-relevant integrity issue because operators may rely on adaptive behavior that does not exist, leading to persistent misclassification and misplaced trust in the system's resilience.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The README explicitly encourages processing private messaging sources such as Signal and Telegram, generating digests, and drafting responses, but it provides no warning about handling sensitive personal or business communications. In a skill that centralizes inbox data and may write outputs to files or logs, omission of privacy guidance increases the risk of unintended disclosure, over-collection, or insecure storage of message content.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The trigger phrase "Help me triage my inbox" is broad enough that it could be invoked during ordinary conversation, causing the skill to activate unexpectedly. Because this skill processes private communications across multiple sources, accidental invocation increases the chance of unintended access, analysis, or disclosure of sensitive message content.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill explicitly gathers and analyzes messages from services like Signal, Telegram, Discord, and email, but it does not present a clear privacy warning or consent model. Users may not understand that highly sensitive personal or business communications will be aggregated, summarized, logged, and potentially retained, creating substantial confidentiality and compliance risk.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal