Mark Heartflow Skill

Security checks across malware telemetry and agentic risk

Overview

The skill appears intended as a local cognitive-memory engine, but it includes powerful local execution, codebase-audit, memory export/import, and daemon controls that are not clearly scoped for users.

Install only if you are comfortable giving this skill local process and filesystem influence. Treat its memory files as sensitive, avoid enabling the daemon or direct dispatch APIs in untrusted contexts, set `SHUTDOWN_TOKEN` if using the daemon, and review or disable code-execution and memory export/import routes before relying on it in a real agent environment.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (562)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: return _last_inject try: result = subprocess.run( ["node", MEMORY_INJECT_SCRIPT], capture_output=True, text=True,
Confidence: 70% confidence
Finding: result = subprocess.run( ["node", MEMORY_INJECT_SCRIPT], capture_output=True, text=True, timeout=10, cwd=HEARTFLOW_SKILL_DIR,

Lp3

Medium

Category: MCP Least Privilege
Confidence: 70% confidence
Finding: Without declared permissions the skill's intent is opaque and cannot be validated.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The manifest describes HeartFlow as a cognitive engine for reflection, psychology, philosophy, and memory, but this report explicitly references `skillGenerator` performing self-modification and requiring user consent. Generating or modifying skill code is a materially different capability from reflective reasoning and is not obviously required by the declared purpose.

Context-Inappropriate Capability

Medium

Confidence: 84% confidence
Finding: The file states fixes for `selfEdit` path traversal in `meta-engine.js`, which implies the skill can edit files or itself. A separate cognitive engine focused on reflection and AI psychology does not inherently require filesystem self-editing capabilities.

Context-Inappropriate Capability

Medium

Confidence: 81% confidence
Finding: The report mentions `Embedding 外部传输` and that external sending is disabled by default unless opted in, which indicates the skill can transmit data externally. The manifest emphasizes internal cognition, reflection, memory, and philosophy, not outbound embedding or external API integrations as a core function.

Context-Inappropriate Capability

Low

Confidence: 72% confidence
Finding: The added `scripts/hf_judge.js` is a standalone CLI script that reads process arguments, loads code from a fixed filesystem path, and exits with process status codes. While possibly useful operationally, a CLI execution surface is not part of the manifest’s stated reflective/psychological function and expands the skill’s capability footprint.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The manifest describes HeartFlow as a cognitive engine for reflection, psychology, philosophy, and memory layers, but this CLI adds a capability to execute another JavaScript file via child_process.execFileSync. Launching subprocesses is not an obvious requirement for delivering analysis/status output and materially expands the skill's operational capability beyond the stated purpose.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The manifest describes HeartFlow as a cognitive engine for reflection, psychology, philosophy, and memory layers, but this daemon adds operational capabilities unrelated to that purpose: spawning a separate Node process to run a script and exposing a socket API that can trigger those actions. Executing an external script via child_process is not an obvious requirement for analyzing text, reflection, or judgment as described in the manifest.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The manifest frames the skill as a thinking/reflection engine, but the daemon accepts a 'shutdown' command that terminates the process. Process lifecycle control is an administrative capability, not an obvious part of self-reflection, psychology analysis, truth checking, or memory synthesis.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The inline documentation at L124 and the header comment at L14 state that shutdown requires a SHUTDOWN_TOKEN environment variable. However, the check only rejects requests when expectedToken is set and mismatched; if SHUTDOWN_TOKEN is unset, any client can issue shutdown successfully, which contradicts the documented intent.

Intent-Code Divergence

Low

Confidence: 96% confidence
Finding: The startup comment says 'initial state: engine loaded' and logs 'engine ready', but the code defers engine loading until a 'bundle' request calls loadEngine() at L74. This directly contradicts both the earlier header comment promising one-time load at startup and the actual lazy-load behavior.

Description-Behavior Mismatch

Medium

Confidence: 83% confidence
Finding: The manifest describes an internal cognitive engine focused on thinking, reflection, psychology, and philosophy. This file implements a persistent Unix-socket daemon with ping/status/shutdown commands and idle timeout management, which is service-management behavior not conveyed by the manifest description.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The manifest describes HeartFlow as a cognitive/self-reflection engine centered on thinking, memory, psychology, and philosophy. However, the status handler returns operational host/process telemetry such as PID, uptime, and full memory usage, which is infrastructure introspection rather than the declared cognitive functionality.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: A skill framed as an AI cognition, reflection, and memory engine does not obviously require exposing low-level runtime diagnostics to callers. Returning process identifiers and Node.js memory usage gives clients visibility into the hosting process rather than advancing the manifest’s stated thinking-oriented purpose.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The manifest and tool description frame HeartFlow as a cognitive engine with three memory layers and describe `heartflow_memory_search` as cross-layer memory retrieval using semantic or keyword search. However, the `ephemeral` branch uses `searchByTimeRange(query, limit)`, passing the user's textual query into what is explicitly a time-range API, which is behaviorally different from the documented memory search semantics.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The manifest emphasizes reflective cognition, memory layers, psychology, and philosophy, while the `heartflow_status` tool is documented as a service health check returning version, startup timing, module count, and memory layer state. In full-detail mode, the implementation also returns process PID, uptime, and full `process.memoryUsage()`, which are operational host diagnostics not reflected in the stated skill purpose or tool description.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: A skill presented as an AI cognition, reflection, memory, psychology, and philosophy engine has a clear justification for internal analysis and memory inspection, but not for exposing server PID and raw process memory metrics to callers. These diagnostics are infrastructure-observability capabilities rather than obvious requirements for the declared thinking-oriented purpose.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The file-level documentation describes this plugin as automatically reading the memory/ directory and injecting accumulated lessons and preferences into the system prompt. Instead of performing that logic directly, it executes an external JavaScript program via subprocess, which introduces code-execution capability beyond ordinary memory reading/injection and is not declared in the stated purpose.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The module docstring claims that each user message causes the plugin to automatically read the memory/ directory and inject accumulated records. In practice, the main path is a cached subprocess call to heartflow-memory-inject.js, and the fallback reads only memory/memory-inject.txt, so the actual behavior is materially different from the described direct per-message directory read.

Intent-Code Divergence

Low

Confidence: 96% confidence
Finding: The before_message docstring states that greetings and short inputs should still receive the user's name if available. However, when input_type == "greeting", the code unconditionally sets inject_text to an empty string, and there is no separate logic to extract or inject a name.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The file-level documentation explicitly states '注入后自动更新 lastAccessed 时间戳'. However, the implementation only reads memory entries, formats output, writes to stdout, and saves a text file; there is no assignment, persistence call, or method invocation that updates any entry's lastAccessed field. This is an active contradiction between documented intent and actual behavior.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The inline comment says 'identity./philosophy. 开头的记忆不注入', implying explicit exclusion of those memory namespaces. In practice, the code selects CORE entries solely by tags such as lesson, user_correction, and user_preference, with no key-prefix check for identity./philosophy.; matching entries could still be injected if they carry one of the allowed tags. That is a contradiction between the stated filtering rule and the implemented one.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The manifest describes an internal cognitive engine centered on reflection, memory layers, psychology, and philosophy. This CLI adds an operational capability to export all CORE/LEARNED/EPHEMERAL memory into a shareable plaintext file, which is not an obvious or necessary part of the stated thinking-oriented purpose and materially increases data exfiltration risk.

Intent-Code Divergence

Low

Confidence: 97% confidence
Finding: The top-level documentation says the write command accepts optional tags, implying separate tag handling. In code, `cmdWrite(hfm, args[1], args.slice(2).join(' '))` passes everything after the key as a single value string, and `cmdWrite` hardcodes tags to `['manual']`, so user-supplied tags are not honored.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The manifest describes a cognitive engine for self-reflection, dream synthesis, AI psychology, philosophy, and memory-layered thinking. This file instead implements a generic habit/goal tracker with local JSON persistence, goal CRUD, status transitions, and reporting, which is a materially different operational behavior rather than an obvious implementation detail of the declared purpose.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal