Mulch Self Improving Agent

Security checks across malware telemetry and agentic risk

Overview

The main Mulch memory workflow is mostly legitimate, but the package includes broader, under-disclosed behaviors that users should review before installing.

Review before installing. Use project-local, narrowly matched hooks instead of global always-on hooks; record only sanitized technical lessons; do not store secrets, credentials, customer data, private conversations, or sensitive code snippets in Mulch. Treat the Telegram notification feature as unspecified unless you verify the exact implementation. Remove or separately review the bundled visual-explainer folder if you only want the self-improvement memory workflow, because it can scan repos, write persistent HTML files, open a browser, and optionally send project-derived prompts to external image tooling.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (39)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: The documented behavior extends beyond a simple 'capture learnings' skill into automatic detection, OpenClaw bootstrap hook injection, promotion into global memory files, and skill extraction. That mismatch matters because users may enable the skill expecting passive note-taking, while it can influence future agent behavior across sessions and workspaces, increasing the chance of unintended persistence, prompt-surface expansion, and trust boundary violations.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The document's security section materially understates what the configured hooks do. The setup explicitly registers shell commands as hooks and also references an extraction script that creates scaffolding, so claiming the scripts only emit text can mislead users into granting trust or broader deployment under false assumptions.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The benchmark script persists and later searches personal-profile data such as writing style, social addressing norms, habits, and preferences that are not necessary for the skill's stated purpose of learning from failures or corrections. Because this data is stored via an external CLI and treated as reusable memory, it expands collection beyond legitimate operational need and creates unnecessary privacy and profiling risk.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The script explicitly records relationship and administrative details like a manager's name, team lead, and team communication practices, which are sensitive organizational context not justified by a generic self-improvement benchmark. Storing this kind of workplace graph data can enable profiling, social engineering, or unintended disclosure if the memory store is accessed by other agents, users, or systems.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The prompt explicitly instructs the agent to mine prior conversation and local progress files under the user's home directory to reconstruct rationale. That expands data access beyond the stated self-improvement purpose and can pull sensitive user context into the generated review without need-to-know boundaries or consent.

Context-Inappropriate Capability

Medium

Confidence: 79% confidence
Finding: The optional `surf gemini --generate-image` step sends project-derived prompts to an external tool/service unrelated to the core self-improvement function. This increases data exposure and capability scope, especially if architectural details from the diff are used in the generated image prompt.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file implements a visual diff-review and code-review artifact generator rather than a self-improvement memory skill. That scope mismatch is dangerous because it grants the package a materially different set of behaviors—repo inspection, review generation, optional external calls, and browser opening—than users would expect from the manifest.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The prompt directs the agent to invoke an external image-generation CLI and to open generated output in a browser, which expands the skill's capability surface beyond its declared self-improvement purpose. Even if not overtly malicious, these side effects can trigger unintended network access, local file creation, and application launching without clear user confirmation.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: This file's behavior is unrelated to the declared self-improvement skill and instead implements a visual diagram generation workflow. That mismatch is dangerous because it can smuggle unexpected capabilities into a trusted skill package, causing reviewers or users to grant broader permissions than they intended.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The prompt instructs the agent to optionally invoke an external CLI (`surf gemini --generate-image`) for a task that is fundamentally a plan/code review rendered as HTML. This expands the skill's execution surface beyond the stated purpose, may send repository-derived context to an external model/tool, and introduces command execution and data exfiltration risk without explicit user consent.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The prompt directs the agent to write output into a fixed persistent location under the user's home directory and open it in a browser, which goes beyond merely generating a review. This creates unrequested filesystem side effects, may overwrite or accumulate sensitive artifacts, and can trigger additional local execution/interaction through the browser without confirmation.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The prompt explicitly tells the agent to mine same-session conversation history for decision context, which can pull in unrelated or sensitive user data beyond what is needed to summarize the repository. Because the generated recap is meant to be rendered and stored as HTML, this creates a realistic path for private prompts, secrets, or incidental personal data to be surfaced in output.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The prompt frames the task as generating a self-contained HTML recap, but then instructs the agent to write that output into the user's home directory and open it in a browser. That expands the operation from content generation into filesystem modification and program execution, which increases risk and breaks least-surprise for a user expecting analysis only.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The 'when to use' guidance is very broad and includes many common agent situations, making the skill likely to activate frequently and collect or propagate low-quality or sensitive context. In a persistence-oriented skill, over-triggering can amplify bad records, store user corrections or operational details unnecessarily, and affect later sessions through priming and promoted memory.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: Auto-detection based on phrases like 'no', 'actually', 'wrong', or retry requests is ambiguous and overlaps with normal conversation, so the skill may misclassify benign dialogue as a learning event. Because the skill writes persistent records and may later prime them into context, false detections can pollute memory, capture unintended content, and normalize collecting user utterances without sufficiently precise consent.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The detection trigger section lists common correction phrases without boundaries or counterexamples, which makes accidental invocation likely in everyday conversation. In this skill's context, that is more dangerous than in a non-persistent assistant feature because the resulting records can be retained, queried, promoted into global guidance files, and influence future agent actions.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: Using an empty matcher causes the activator command to run for every prompt, creating a broad interception point with little scoping. In an agent skill focused on self-improvement and session memory, this increases the chance of over-collection, unintended prompt influence, or persistent behavior across unrelated tasks.

Vague Triggers

High

Confidence: 95% confidence
Finding: The user-level configuration combines global scope with an empty matcher, so the hook may fire across all sessions and projects for that user. That broad persistence meaningfully raises risk because any prompt can trigger the command, extending the skill's reach beyond the intended repository or context.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: Although presented as a lighter-weight option, the minimal setup still triggers on every prompt because the matcher is empty. Reducing overhead does not reduce scope, so users may incorrectly assume this variant is safer when it still creates constant prompt-time execution.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The Codex configuration repeats the same empty-matcher pattern, producing an all-prompts trigger in another agent environment. Replicating the broad hook across providers increases the attack surface and normalizes unsafe default scoping.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The script sends personal preference, habit, and relationship data to an external CLI-backed memory system without any user-facing notice, consent mechanism, or disclosure about where the data is stored and how it will be used. In the context of an agent skill designed to accumulate memory across sessions, silent collection is more dangerous because it normalizes persistent retention of personal data under the guise of improvement.

Missing User Warnings

Low

Confidence: 85% confidence
Finding: The changelog advertises optional AI-generated illustrations via external tooling/model use but does not mention any user-facing notice, consent, privacy boundary, or data-handling implications. In an agent skill context, this can cause users or downstream agents to send potentially sensitive content to third-party services without clear awareness, making the omission a real but low-severity transparency and privacy issue.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The automatic invocation rule for 'about to dump a complex table in the terminal (4+ rows or 3+ columns)' is underspecified and broad enough to trigger in many normal workflows. Because this skill writes HTML output and opens it in a browser, an imprecise condition increases the chance of unintended external actions and could expose sensitive work products in generated artifacts.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The automatic invocation rule for 'about to dump a complex table in the terminal (4+ rows or 3+ columns)' is underspecified and broad enough to trigger in many normal workflows. Because this skill writes HTML output and opens it in a browser, an imprecise condition increases the chance of unintended external actions and could expose sensitive work products in generated artifacts.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README advertises that the skill generates a self-contained HTML page and opens it in the browser, but it does not present this as a security-relevant side effect or warn about local file creation and browser launching. In practice, this can surprise users and lead to unintended disclosure of code review contents, architecture details, or other sensitive data through automatically generated local HTML artifacts.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal