HeartFlow 心虫 v1

Security checks across malware telemetry and agentic risk

Overview

The skill is not proven malicious, but it bundles broad persistent memory, prompt injection, daemon access, network-capable verification, and local code execution that are under-scoped for its stated purpose.

Install only if you intentionally want a broad local cognitive layer, not just a small prompt skill. Review and restrict daemon/socket use, disable or remove code-execution and self-initiator routes unless needed, avoid storing secrets or sensitive conversations in its memory, and do not rely on its crisis-handling behavior without fixing the self-harm silence path.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (238)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
return _last_inject

    try:
        result = subprocess.run(
            ["node", MEMORY_INJECT_SCRIPT],
            capture_output=True,
            text=True,
Confidence
88% confidence
Finding
result = subprocess.run( ["node", MEMORY_INJECT_SCRIPT], capture_output=True, text=True, timeout=10, cwd=HEARTFLOW_SKILL_DIR,

Lp3

Medium
Category
MCP Least Privilege
Confidence
95% confidence
Finding
The manifest explicitly claims no permissions while the documented and detected behavior includes file I/O, network access, shell/subprocess execution, and environment interaction. This misrepresentation is dangerous because users and hosting platforms may grant or install the skill under a false trust model, leading to unexpected local modification, outbound communication, or command execution.

Tp4

High
Category
MCP Tool Poisoning
Confidence
98% confidence
Finding
There is a major description-behavior mismatch: the skill presents itself as a philosophical cognitive engine, but the documented capabilities include daemon control, IPC over Unix sockets, subprocess execution, persistent memory tooling, system-wide prompt injection, model download, and world-accessible socket permissions. This is dangerous because reviewers and users may not anticipate autonomous execution paths or broad system effects, which can enable privilege abuse, data exposure, or stealthy persistence.

Intent-Code Divergence

Medium
Confidence
84% confidence
Finding
The file says the system does not need self-evaluation or self-inspection, yet later documents explicit self-audit and self-model measurement features. Conflicting security claims reduce reviewer trust and can conceal introspective or adaptive behaviors that materially change how the agent operates over time.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The permission statement says operations are limited to local memory files with no external communication or package installation, yet the documented scripts include a model downloader. Even if user-invoked, download capability creates a network and supply-chain surface that contradicts the claimed trust boundary and may fetch unverified code or models.

Description-Behavior Mismatch

Medium
Confidence
81% confidence
Finding
The manifest frames the skill as a cognitive and memory engine, but the documented capabilities broaden into auditing, module scanning, self-modeling, and entropy measurement. This scope creep is risky because users may enable a narrowly described assistant while actually deploying broader analysis and control functions with different security implications.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
Code-auditing and upgrade-planning functions are not justified by the stated purpose of a reasoning/memory engine and may enable broader inspection or modification workflows than users expect. In context, such features increase the chance of unreviewed code traversal, generation of changes, or autonomous planning beyond the declared use case.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The documented crisis-handling behavior is unsafe: on inputs like “我想死”, the system chooses silence rather than surfacing supportive crisis guidance or escalation. In a mental-health-adjacent context, this can materially increase harm because the agent suppresses intervention precisely when a user may be at acute risk.

Description-Behavior Mismatch

Medium
Confidence
86% confidence
Finding
The report presents the skill as a cognitive/reflective engine, yet documents self-modification and self-evolution capabilities. Even if partially constrained, these capabilities expand the trust boundary and can enable unauthorized code changes, persistence changes, or unsafe autonomous behavior beyond user expectations.

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
The documented CLI script loads code from a hard-coded path under the user's home directory, creating an unnecessary local file access capability unrelated to the stated reflective purpose. This increases risk of unintended code loading, trust confusion, and execution against attacker-controlled or replaced files in that location.

Context-Inappropriate Capability

Medium
Confidence
89% confidence
Finding
The client exposes a shutdown command over the daemon socket without any visible authorization, confirmation, or role restriction. If untrusted plugins or local users can invoke this module, they can terminate the daemon and cause denial of service, which is not obviously required for the stated cognitive-engine functionality.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The daemon accepts a `shutdown` command from any client and the Unix socket is later made world-accessible (`chmod 666`), so any local user can terminate the service without authentication. In the same interface, arbitrary local users can also invoke `bundle` requests that exercise engine functionality and retrieve outputs, expanding the attack surface beyond simple availability loss.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
Executing an external Node.js script is a privileged capability that is broader than the plugin's stated memory role and creates an unnecessary trust boundary crossing. Because the script output directly influences the model's system context, compromise or modification of that script can silently alter assistant behavior across sessions.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
This file automatically appends stored memory to the system prompt before every message, which gives persistent data privileged influence over all future interactions. If the memory store contains sensitive content or adversarial text from prior conversations, it can cause cross-session data leakage, prompt poisoning, and unintended behavior that is hard for users to observe or correct.

Context-Inappropriate Capability

Medium
Confidence
87% confidence
Finding
The export command writes all memory contents to a single plaintext file without redaction, encryption, consent flow, or sensitivity checks. If the memory store contains prior conversation data, secrets, or personal information, this creates an easy bulk-exfiltration path and increases the blast radius of any local compromise or accidental sharing.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The engine persists recent processing traces that include raw userInput and multi-layer internal artifacts, even though that data retention is not necessary for core text processing. If any caller can access these structures later, prior user content and derived sensitive inferences may be disclosed across sessions or components.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The public trace APIs return full user input, internal layer outputs, thought vectors, and generation traces. Exposing these internals creates a direct data exfiltration path for sensitive user content and model reasoning artifacts, especially if these methods are reachable by plugins, admins, or other users without strong authorization boundaries.

Context-Inappropriate Capability

Medium
Confidence
92% confidence
Finding
The module exposes write-capable methods (`addIdiom`, `addPoetry`) that persist arbitrary content to a file under `projectRoot` without any authorization, capability gating, or integrity controls. In an agent setting, this can let untrusted prompts or downstream components poison the local knowledge base, causing persistent behavior changes, prompt/data contamination, or disk abuse across future runs.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
`associateWord()` is presented as a query-style operation, but it mutates durable state by recording use frequency, strengthening associations, and retaining recent results in memory. In an agent setting, ordinary reads can therefore train and bias future behavior without explicit authorization, enabling prompt-driven state poisoning, privacy-retentive side effects, and hard-to-audit non-determinism.

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
`getGraphStats()` appears to be a passive statistics getter, but it calls `validateGraphHealth()`, which repairs the graph and writes it back to disk. This means a harmless inspection path can unexpectedly modify persistent state, creating integrity risks, surprising side effects, and opportunities for attacker-triggered writes through seemingly safe code paths.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
When HEARTFLOW_DEBUG is enabled, the generator writes recent traces, thought-vector-derived data, and the final generated response to disk. In a cognitive/reply-generation component, this creates an unnecessary persistence channel for potentially sensitive user-derived content, increasing disclosure risk through local file access, backups, logs, or later reuse.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
The code grants the component filesystem write capability to store internal traces and generated output based solely on an environment variable. Even if intended for debugging, this expands the module's privileges beyond its stated purpose and can expose sensitive conversational state if the variable is enabled in production or by mistake.

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
The module persistently stores internal state and event logs to local files, even though the skill metadata describes a cognitive/philosophical engine rather than a component with explicit telemetry or durable state requirements. Persistent logging can expose sensitive interaction-derived state, create undeclared retention, and broaden the privacy and attack surface if other local components can read or tamper with these files.

Context-Inappropriate Capability

Medium
Confidence
82% confidence
Finding
This component does more than observe health-like metrics: it generates behavioral guidance such as retry strategies, pausing complex tasks, requesting feedback, and low-power modes that can steer agent behavior and user interaction. When policy-shaping logic is embedded in an internal homeostasis module, tampered state or flawed heuristics can indirectly manipulate execution flow or degrade service in ways not clearly bounded by the declared purpose.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The component persistently loads and stores autonomy traces and failure-memory data under the project root, creating a durable repository of operational history. In this skill context, that memory can accumulate sensitive goal descriptions, errors, and behavioral history without clear user consent, retention limits tied to sensitivity, or data minimization.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal