Heartflow Clawhub2

Security checks across malware telemetry and agentic risk

Overview

This skill needs review because it goes beyond a memory/reasoning helper into persistent local state, broad agent behavior changes, external tool installation, and upgrade/execution machinery.

Install only after reviewing the scripts and accepting that this skill may persist local personal/project context, modify agent behavior across sessions, and install or clone additional tooling. Avoid running the memory-tool installer, browser bridge, sync/upgrade scripts, or HEARTFLOW_ENABLE_INTERNAL_AUTOMATION unless you understand the paths and can run them in a contained environment. Treat its psychology and safety outputs as heuristic, not professional or verified safeguards.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (111)

Intent-Code Divergence

Medium

Confidence: 72% confidence
Finding: The file says the skill is not a daemon or background service, but also says it activates automatically in every session and runs before actions. This contradiction obscures the execution model and can mislead operators about when the skill is active, making unintended interception or modification of conversations more likely.

Intent-Code Divergence

Medium

Confidence: 76% confidence
Finding: The manifest claims each startup is a fresh complete instance, while the rest of the document claims persistent memory and identity continuity across restarts. This inconsistency can hide data retention behavior from users and reviewers, which is especially risky in a skill handling psychology, memory, and personal context.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The script installs and promotes use of a browser-bridge component that is not clearly justified by the stated memory-tool installation purpose. Adding undeclared browser-connected tooling expands the attack surface and can grant web/content access capabilities users did not explicitly consent to, especially in an agent skill context where capability boundaries matter.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The script not only references but actively enables a browser-bridge workflow by cloning the repository and instructing the user to run bridge.py. That introduces a potentially powerful integration channel between the local environment and browser context without clear necessity for the advertised memory features, making misuse or compromise more damaging.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The script performs local file generation into internal data and source directories, which is a real self-modifying/asset-writing capability beyond passive cognitive inference. Even though it is gated by an environment variable and framed as internal maintenance, this still expands the attack surface because enabling it allows unattended writes into the codebase/runtime state and could be repurposed for persistence or unauthorized modification.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: This is a true vulnerability because the script generates a JavaScript module and writes it into the source tree, creating a self-upgrade mechanism inside the skill artifact. In skill/plugin ecosystems, code generation plus atomic replacement is dangerous even if the current payload is static, because the mechanism itself can later be adapted to deliver arbitrary logic or bypass normal review expectations.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The test script explicitly exercises the agent's `bash`, `read`, `write`, and `glob` tools, confirming that the skill exposes shell execution and filesystem access beyond the manifest's stated cognitive/self-healing purpose. Even though this appears intended for local testing rather than abuse, these capabilities materially expand the attack surface: if the agent can be influenced by prompts or untrusted inputs elsewhere, an attacker could leverage the same tools for command execution, file discovery, or file modification.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The generator records detailed execution traces and persists them to disk, including thoughtVector dimensions/emotion and the final generated response. Even though userModel is not directly serialized in this file, the trace still captures sensitive internal reasoning-state and output content without access controls, retention limits, or minimization, creating an information disclosure/privacy risk beyond simple reply generation.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The module persists behavioral/state data to a local JSONL log as a side effect of construction and other operations, even though its stated role is reasoning/existence logic rather than storage or telemetry. Hidden persistence expands the data surface, can retain potentially sensitive interaction metadata, and creates auditability and privacy risks if operators do not expect disk writes from this component.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The module presents a safety disclaimer saying it is not a diagnostic tool, yet it still generates deterministic-looking psychological narratives about parents' unresolved emotions, childhood experiences, and intergenerational patterns. In a mental-health-adjacent skill, that mismatch can mislead users into treating speculative inferences as validated assessment, causing inappropriate self-blame, family conflict, or delayed professional help.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The module rewrites first-person causal statements into supposedly 'observable' reports using fixed string substitutions, but those replacements do not come from actual observations or evidence. In a mental-health or self-reflection skill, this can fabricate epistemic certainty, distort user meaning, and produce misleading assessments or audit outputs that appear more grounded than they are.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: This module writes a persistent self-model claiming capabilities such as bash execution, web access, skill loading, and file modification regardless of whether those powers are actually available or authorized at runtime. In an agent system, downstream planning or policy components may trust this stored metadata and make unsafe decisions, causing over-privileged behavior, attempted policy bypass, or risky tool selection based on false assumptions.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The documented transparency principle says the model should acknowledge uncertainty, but the checker treats phrases like '我不知道' and '我不确定' as violations. This inverts the intended safety behavior and can pressure downstream logic to suppress honest uncertainty, increasing the risk of overconfident or fabricated answers.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: For harmful or privacy-sensitive content, the revision path marks the item as 'needs manual review' but still sets the revision as modified without sanitizing, blocking, or replacing the dangerous text. In an automated pipeline, this can create a false sense of mitigation while unsafe or privacy-violating output continues to propagate.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The code declares the constitution immutable, yet loadValues() will recreate CORE_VALUES.md from the hardcoded CONSTITUTION string when the file is missing. This undermines the stated trust boundary: an attacker or buggy component that deletes or replaces the file can cause policy reset to code-defined defaults, defeating the claim that the constitution is externally protected and immutable.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The constitution says code modifications require user informed consent, but reviewProposal() never validates consent before approving a proposal. In a self-modifying or agentic system, this creates a governance bypass where changes may be approved despite violating an explicit safety prerequisite, weakening user control and auditability.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The module is presented as a fact checker, but most implemented logic only detects whether claims contain numbers, percentages, or date-like patterns and performs minimal range checks. In a skill framed as a cognition, verification, and safety layer, this can mislead downstream components or users into treating unverified claims as validated facts, which creates integrity and trust risks.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The inline comments and function names say the code 'verifies' numbers, percentages, and dates, but the functions only perform regex extraction and superficial checks such as percentage bounds or future-year detection. This semantic overclaim is dangerous because other parts of the system may rely on these methods for safety-critical validation despite their inability to establish truth.

Description-Behavior Mismatch

Medium

Confidence: 98% confidence
Finding: The skill metadata advertises active SecurityChecker, crisis assessment, and related safety capabilities, but this file exposes archived no-op ethics/safety functions that return effectively inert results such as safe:true with checked:false. That creates a dangerous trust gap: integrators or users may rely on protections that are not actually enforced, especially in a mental-health-adjacent assistant where crisis detection and safety gating materially affect user risk.

Description-Behavior Mismatch

Low

Confidence: 86% confidence
Finding: The manifest claims topic isolation and fact-checking/hypothesis-testing, but the exported helpers here mainly provide workflow switching and lightweight psychology inference without evidence of those controls being enforced in this file. This mismatch can mislead callers into assuming stronger containment and verification guarantees than exist, increasing the chance of unsafe cross-topic leakage or unverified reasoning being treated as vetted output.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The analyzePsychology function assigns defense mechanisms, deep needs, and suspicious motives from minimal keyword matching, which is speculative profiling rather than a narrowly scoped PAD/flow calculation. In a system presented as a cognitive/self-healing engine, such profiling can drive manipulative or inappropriate responses, especially if downstream components treat these inferences as reliable internal truth about the user.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The skill is presented as a cognitive/self-healing engine, but the class also exposes broad operational capabilities through direct methods such as executeTask(), run()/bash(), read(), write(), and search(). These methods bypass the dispatch whitelist entirely, creating a materially larger attack surface than the manifest suggests and enabling command execution or arbitrary file access if any upstream caller treats the skill as low-risk based on its description.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: run() and bash() forward arbitrary input to toolDispatcher.handle(command), explicitly enabling shell-command execution. Because these are public direct APIs rather than dispatch routes, an integrator may incorrectly rely on the route whitelist for safety while still exposing command execution through ordinary method calls, leading to host compromise, data exfiltration, or destructive actions.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The read(), write(), and search() methods expose arbitrary filesystem and content-search operations via toolDispatcher.execute() with caller-controlled paths and content. In the context of a skill marketed as a cognition engine, these capabilities are unexpected and dangerous because they can be abused to read secrets, overwrite files, plant malicious content, or enumerate sensitive project data.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The comment states that dangerous execution routes were removed for security, but equivalent dangerous functionality remains available through direct class methods. This mismatch can mislead reviewers and integrators into overestimating the protection provided by the dispatch whitelist, resulting in unsafe deployment assumptions and indirect exposure of command and file primitives.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal