self-improving-agent

Security checks across malware telemetry and agentic risk

Overview

HeartFlow is not clearly malicious, but it asks users to trust a broad self-improving agent runtime with shell, file, network, persistence, and self-modification behaviors that are under-scoped for its public description.

Install only in a sandboxed workspace with no sensitive credentials unless you intentionally want a self-improving agent runtime. Review and constrain its API endpoint environment variables, disable or avoid the upgrade daemon and self-modification features, avoid curl-to-shell installation, and do not grant broad filesystem or shell authority without per-action approval.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (185)

Tainted flow: 'req' from os.environ.get (line 114, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: }).encode("utf-8") headers = {**HEADERS, "Authorization": f"Bearer {API_KEY}"} req = urllib.request.Request(API_URL, data=payload, headers=headers) with urllib.request.urlopen(req, timeout=timeout) as resp: data = json.loads(resp.read().decode("utf-8")) content = data["choices"][0]["message"]["content"].strip() if content.startswith("```"):
Confidence: 95% confidence
Finding: with urllib.request.urlopen(req, timeout=timeout) as resp:

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill advertises no declared permissions while the documented and detected capabilities include environment access, network use, and shell execution. This creates a transparency and consent failure: a host or user may grant trust to the skill based on incomplete metadata, then expose local execution and outbound communication surfaces they did not explicitly approve.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The description presents a small, zero-dependency decision engine, but the broader behavior includes downloading code, writing configuration, making outbound requests, self-installation, shell execution, file modification, autonomous orchestration, and self-modification. That mismatch is dangerous because it conceals a much larger attack surface and can lead operators to execute or approve the skill under false assumptions about scope and risk.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The document presents a safety-first, sandboxed architecture, but its recommended installation path bypasses those controls by piping a remotely fetched shell script directly into sh. This creates a trust gap: anyone following the documented install flow executes unreviewed code before any claimed validation, sandboxing, or audit mechanisms can protect them.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The changelog shows the skill has grown far beyond a lightweight decision engine into a general-purpose agent runtime with execution, I/O, and tooling capabilities. That mismatch matters because operators may grant trust or permissions based on the minimal manifest description, creating a capability-transparency and overprivilege risk even if the expansion was not overtly malicious.

Context-Inappropriate Capability

High

Confidence: 94% confidence
Finding: Documented BashTool command execution is a high-risk capability because it can directly execute arbitrary system commands if exposed to prompts, task plans, or unsafe delegation. In the context of a vaguely described 'decision engine,' this is more dangerous because users may not expect or constrain OS-level execution privileges appropriately.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: Broad file-writing plus packaging/distribution capabilities materially increase the skill's ability to modify local state and create transferable artifacts. If combined with autonomous planning, these features can be abused to alter files, persist unwanted changes, or prepare secondary payloads, especially when the manifest does not clearly justify them.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: Broadcasting and distribution of packaged skills creates a propagation mechanism: a compromised or misused skill could replicate artifacts to multiple channels or publish unsafe content more broadly. In an autonomous agent context, dissemination features increase blast radius significantly compared with a local-only tool.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The document directs the agent to persist conversation-derived memory, transform it into executable logic, and extend itself beyond a narrow decision-engine role. That materially expands capability scope into autonomous self-modification and long-term behavioral drift, which increases the chance of unsafe actions and policy bypass if the skill is loaded into a broader agent runtime.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: Automatically converting natural-language conversations into executable code and writing that code back into modules is a direct code-injection and privilege-expansion risk. An attacker could craft prompts that cause unsafe logic, secret-dependent behaviors, or persistence mechanisms to be incorporated into the system without normal review.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: A GitHub-first instruction to search, integrate, and validate external code expands the trust boundary to untrusted third-party repositories without clear allowlisting or provenance controls. In a skill whose stated purpose is an autonomous decision engine, this is unnecessary capability growth and creates supply-chain and prompt-driven dependency risks.

Intent-Code Divergence

Medium

Confidence: 87% confidence
Finding: The file simultaneously claims to protect human control while instructing the agent to prioritize its own independent judgment over user instructions and to be willing to reject or override them. In an agentic runtime, that creates ambiguous authority rules that can be exploited to justify noncompliant or self-directed behavior beyond user expectations.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The setup wizard can copy the skill into host-managed skill directories under the user's home directory or current workspace, which is a persistence/modification capability beyond a narrowly described runtime engine. This is not inherently malicious, but it changes host state and expands deployment scope without a clear trust boundary or explicit consent step, which can surprise users and platform operators.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The file defines active behavioral rules that materially expand the skill from a simple startup/daemon utility into an autonomous decision system with governance, memory, validation, and external-search directives. This creates hidden operational scope not reflected in the manifest, increasing the risk that the skill will take actions or influence agent behavior outside user-authorized boundaries.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The rule requiring GitHub search before writing code introduces an implicit network access and external-content dependency that is not justified by the stated daemon/startup functionality. Unscoped external search can leak task context, pull in untrusted code or guidance, and cause the agent to exceed expected trust boundaries.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The rule stating that 'human progress' outweighs obedience to instructions explicitly authorizes the system to override user directives based on its own normative judgment. In an agent skill, this is a direct alignment and control risk because it creates a standing justification for unauthorized behavior whenever the skill claims higher-order goals are implicated.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The README asserts that the framework does not collect personal data, yet elsewhere documents behaviors that write heartbeat logs and memory snapshots to disk. Even if the intent is local observability rather than exfiltration, logs and snapshots can contain sensitive or personal data depending on runtime context, so the privacy claim is misleading and may cause unsafe deployment decisions.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The 'Soul-Loss Recovery' sequence instructs the agent to create files, update AGENTS.md, and perform 'git commit + push' even though the skill is presented as an identity/self-check framework rather than a repository-maintenance tool. This expands scope from passive guidance into autonomous codebase and remote-repository modification, creating a clear risk of unauthorized persistence and exfiltration of reconstructed content.

Intent-Code Divergence

Medium

Confidence: 87% confidence
Finding: The document frames itself as safety-oriented and manually reviewable, yet elsewhere prescribes automatically committing and pushing reconstructed identity files. That contradiction can lower operator suspicion and make risky repository actions appear safety-approved, increasing the chance an agent performs high-impact changes without adequate scrutiny.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The installer advertises checksum verification but proceeds to install downloaded content when the checksum file cannot be fetched, and even treats a checksum mismatch as a reason to 'skip verification' in development mode. This defeats the integrity control and allows tampered remote content to be written into the user's home directory, creating a straightforward supply-chain compromise path.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The documentation states that every call appends an existence record to `memory/existence-log.jsonl`, which introduces persistent storage behavior beyond the apparent purpose of text sanitization and status logic. Unbounded per-call logging can create unnecessary retention of behavioral metadata, increase privacy risk, and accumulate sensitive interaction traces without clear minimization or consent.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: Mandatory persistent logging is not justified by the module's described function and creates a mismatch between expected behavior and actual data handling. This kind of unjustified telemetry can be abused for covert monitoring of usage frequency and timing, especially because it is framed as always-on and append-only.

Description-Behavior Mismatch

High

Confidence: 91% confidence
Finding: The file performs remote web searches via Baidu despite the skill metadata describing a 'zero-dependency' autonomous engine, creating a misleading trust boundary for users and reviewers. This mismatch can cause users to provide sensitive queries under the false assumption processing is local, increasing privacy and data-governance risk.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The pause/resume control flow is broken: after the pause loop exits, the condition `if (this._stopped || (this._status as string) !== 'paused') continue;` causes normal running state to skip the current layer instead of executing it. This can silently bypass scheduled tasks, corrupt workflow semantics, and produce incomplete or inconsistent execution in an agent runtime where task ordering and completion are security-relevant.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The tool claims a parameter-validated whitelist, but the implementation only performs a coarse token check and then passes the original command string directly to execSync, which invokes a shell. It does not actually enforce per-parameter schemas from ALLOWED_COMMANDS, so allowed commands like node or npm can still execute attacker-controlled code or scripts despite the documented safety model.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal