holmes skill

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it needs review because it tells agents to persist case records, self-edit skill files, use broad web search, and commit repository changes without clear approval boundaries.

Install only if you are comfortable with a skill that may write local case records, read those records later, modify its own skill/reference files, use external web search broadly, and ask the agent to commit repository changes. Before use, require explicit approval for file edits, commits, and web searches, and avoid logging private, regulated, or third-party personal information.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (13)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill appears to perform file read/write operations despite not declaring any permissions, which breaks transparency and prevents users or hosting systems from accurately assessing what the skill can access. Undeclared filesystem capabilities can be abused to create, modify, or inspect local files in ways the user did not authorize or expect.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The skill description is effectively empty while the detected behavior includes persistent local state management, case tracking, file generation, and workflow logic. This mismatch is dangerous because it conceals materially significant behavior from reviewers and users, making risky capabilities appear harmless and increasing the chance of unauthorized or unexpected data storage and file modification.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The README describes autonomous case logging, iterative analysis, and updating of local skill files such as SKILL.md. For an agent skill, self-modifying behavior is risky because it can change future execution semantics, persist unreviewed content, and create a pathway for prompt/data poisoning through prior cases or manipulated inputs.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The README markets the skill for investigating partners, colleagues, and suspicious behavior, which steers use toward surveillance and intrusive inference about real people. In an agent context, this can facilitate privacy violations, stalking-style behavior, or unjustified collection and analysis of sensitive personal information beyond a normal reasoning assistant's purpose.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The manifest description is overly generic compared with the documented behavior, which includes reading historical case files, modifying SKILL.md and lessons files, and performing Git commits. This mismatch can mislead users or reviewers about the skill's actual authority and side effects, reducing informed consent and making risky repository modifications easier to hide.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The documentation explicitly instructs the agent to run `git add -A && git commit`, but the stated skill purpose does not justify version-control write access. This creates an unnecessary capability to persist changes, potentially committing unintended, low-quality, or sensitive modifications into the repository without a clear authorization boundary.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The manifest describes the skill only as a generic 'holmes skill', but the body materially expands behavior to include mandatory web searches and reading/writing local records such as lessons, learnings, and case logs. This mismatch can mislead operators and policy systems about the skill's real capabilities, increasing the chance that network access and persistent data handling occur without informed approval.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The skill requires web_search as a mandatory step for broad categories of tasks, even though the manifest does not justify or scope that capability. This creates unnecessary external data exposure risk because user context, prompts, or derived sensitive details may be transmitted to third-party services during routine problem solving.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README says the script automatically creates directories and case files and that the agent performs iterative updates, but it does not prominently warn users that local files will be created or modified. Hidden or underexplained filesystem side effects are dangerous because users may grant the skill trust it has not earned, and agents may persist data or alter project state unexpectedly.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README encourages recording detailed case data such as problem descriptions, results, and experiment metadata without warning that such records may contain personal, confidential, or regulated information. Persisting these details to local files can create long-lived privacy and data-handling risks, especially if the cases involve real users, incidents, or sensitive operational context.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill directs the agent to update repository files and commit them, yet provides no user warning or approval checkpoint before modifying contents. In context, this is more dangerous because the workflow is framed as a routine automatic iteration trigger (`% 3 == 0`), which normalizes autonomous writes and increases the chance of unauthorized or unnoticed state changes.

Natural-Language Policy Violations

Medium

Confidence: 94% confidence
Finding: The skill explicitly instructs the model to adopt a fixed Holmes-style voice and language pattern, including certainty-heavy phrasing, without any mechanism to defer to user preference or task appropriateness. This can override user-desired tone, locale, and safety-appropriate communication style, and in this file it is amplified by guidance to avoid uncertainty language and favor absolute assertions, which increases the risk of misleading or hostile responses.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The markdown instructs the agent to perform mandatory external web searches but provides no warning that doing so may disclose user-provided or inferred information to external systems. Because the trigger conditions are broad and include ambiguous concepts, the skill can cause frequent network transmission without transparency or consent.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal