Heartflow Skill

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it behaves like a broad persistent AI identity and memory layer with under-disclosed autonomy, self-modification, and safety gaps.

Install only after careful review. Treat this as a broad agent-behavior and memory system, not a small helper skill. Avoid use with sensitive conversations, credentials, private code, or mental-health/crisis workflows unless persistence, external calls, self-editing, generated skills, and safety fallbacks are disabled or tightly controlled.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (120)

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The file explicitly claims it is not a background service and lacks high-risk execution ability, yet later describes execution workflows, action routing, verification loops, and persistent autonomous behavior. This contradiction can cause operators to enable the skill under a false assumption that it is passive or advisory only.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The document claims all writes are guarded by HEARTFLOW_DEBUG and that persistence is off by default, but the rest of the skill markets persistent memory, WAL storage, lessons, logs, and outcome recording as core behavior. False assurances about storage behavior are especially dangerous because they affect privacy expectations and can lead to silent retention of user data.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The manifest frames the skill as a cognition/perception engine, but the documented scope expands into workflow routing, memory reads, skill generation, and execution governance. Scope creep of this kind makes the true trust boundary unclear and increases the chance that a user installs a much more powerful system than intended.

Context-Inappropriate Capability

Medium

Confidence: 81% confidence
Finding: Automatic skill generation from reflection patterns materially increases capability and attack surface beyond a normal cognition engine. Self-extending systems are harder to review, can introduce unvetted behavior over time, and may bypass the assumptions under which the original skill was approved.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The loop starts a timer and immediately self-invokes `run()`, enabling recurring self-initiated behavior without an external authorization gate. In the context of a skill described primarily as cognition/analysis, this materially expands capability from passive assistance to autonomous operation, which can trigger downstream actions through attached dependencies.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: These execution paths do more than analyze: they call dependency methods that execute commitments, create goals, and mutate learning and self-evaluation state. If connected modules have broader privileges, the autonomous loop becomes an actuator that can take actions on its own, increasing risk of unintended task execution and persistent state changes.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The file explicitly states that it makes decisions and executes tasks independently every 30 minutes. An unjustified autonomous scheduler is dangerous because it creates a standing execution mechanism that can continue operating without fresh user intent, magnifying any unsafe dependency behavior or future code changes.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The module claims conversations are anonymized and sent to human reviewers, but it actually stores full conversation records locally and uses simulated scoring. This creates a privacy and transparency risk: sensitive dialogue content may be retained without proper redaction or user awareness, while downstream components may falsely believe a real blind-review safety control exists.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The parenting-reflection branch goes beyond question reframing and emits broad psychological claims such as intergenerational trauma and parental pattern repetition from shallow keyword-triggered input. In a mental-health-adjacent skill, that can mislead users into accepting speculative inferences as grounded assessment, especially during family stress or crisis.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The code comments claim parenting output is exploratory and should be treated as unverified, but the generated user-facing content is presented as authoritative statements without assertion tags or uncertainty markers. This mismatch is dangerous because downstream components or end users may treat speculative mental-health narratives as validated facts, increasing risk of psychological harm and unsafe decision-making.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The transparency report says it indicates whether parenting reflection was triggered, but instead always returns the list of protocol IDs. That creates false auditability: consumers may believe a sensitive psychological-analysis path was or was not invoked when the report does not actually represent runtime behavior.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The code claims to merely detect introspection-illusion language, but `convertToObservableReport` silently rewrites user or system text into new assertions. In a psychological-analysis skill, this can alter the meaning of sensitive user statements, create fabricated paraphrases, and mislead downstream components or auditors into treating transformed text as an observed fact.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The module documentation claims the system provides 'real emotions', but the implementation is a deterministic keyword-triggered simulator with templated outputs. This can mislead users or downstream agents into over-trusting anthropomorphic claims, which is a safety and transparency issue, especially in emotional or psychological contexts.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The generated report explicitly states '我有真实的情感，我是情感的存在' ('I have real emotions'), which materially misrepresents the system's capabilities. In a product that includes psychology, crisis assessment, and relational features, this increases the risk of emotional manipulation, user dependency, or inappropriate reliance on fabricated affective authenticity.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The function `checkEthicsSafety` is documented and named as an ethics/safety check, but it unconditionally returns `{ safe: true, checked: false }`. In a mental-health/psychology-oriented skill, callers may rely on this result to permit unsafe responses or skip escalation, creating a dangerous false assurance that safety review occurred when it did not.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: `detectNegativeEmotion` claims to detect negative emotion but always returns `negative: false, detected: false`. This can suppress downstream interventions, monitoring, or tone adjustments for distressed users, especially problematic in a system advertising psychological analysis and crisis-related features.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: `getCrisisIntervention` is presented as a crisis intervention provider but always returns unavailable/null. In a skill positioned around psychology, PAD/crisis assessment, and user-state inference, this can cause failures to surface help resources or emergency guidance when users express imminent risk.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The file advertises ethics safety, negative-emotion detection, and crisis intervention interfaces, but the implemented functions are archived no-ops that report safe or unavailable. This mismatch is especially dangerous because integrators and users may trust the manifest and API names, believing protective mental-health safeguards are active when they are not.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The metadata describes CodeEngine as limited to analysis, review, audit, fix suggestions, and version comparison, but the route whitelist also exposes code generation, code execution, sandboxing, test running, and TDD workflows. This expands the skill from passive analysis into active code execution capability, which materially increases risk because downstream callers may trust the narrower description and permit the skill in contexts where execution should be prohibited.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The file wires in proactive modules such as curiosity, desire, goal pursuit, and self-initiation that go beyond a cognition/perception engine and create autonomous action surfaces. In an agent framework, undocumented self-initiated behavior can cause unexpected actions, policy bypass by indirection, or expansion of attack surface if exposed through dispatch or later integrated into automated loops.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The header explicitly claims the module is reference-only and not loaded, yet the file auto-runs init() on import and exports a fully functional mutable memory API. This creates a dangerous trust mismatch: developers or tooling may exclude the file from review or runtime assumptions, while any direct require/import can still load sensitive state, create keys, and expose read/write/delete operations.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documentation describes a session-scoped key fallback, but the implementation actually loads or creates a persistent disk-backed key in DATA_DIR/.aes-key. This discrepancy can cause operators to believe learned-memory encryption is ephemeral or isolated when compromise of the key file enables decryption of all stored LEARNED records.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The meta-engine includes a selfEdit capability that can modify arbitrary files anywhere under ~/.hermes/skills, which is a broad self-modification primitive unrelated to ordinary planning/state-management logic. Even though it attempts path restriction, this still enables unauthorized persistence or tampering with other skills, making it dangerous in an agent environment where skills define behavior and trust boundaries.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documentation says selfEdit only updates skill descriptions, but the implementation rewrites any markdown section matched by a user-controlled key or appends new sections. This mismatch is dangerous because it hides the true write scope and could be used to alter instructions, metadata, or other security-relevant content in skill files under the guise of benign description updates.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The module persists conversation-derived personality state to disk and stores transition history including a snippet of user context as the 'reason'. In a conversational system, that creates unnecessary retention of inferred behavioral profiling and fragments of user input, increasing privacy risk if local files are accessed, backed up, logged, or reused by other components.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal