Security audit

@lightharu/agentbrain

Security checks across malware telemetry and agentic risk

Overview

AgentBrain appears purpose-built rather than malicious, but it persistently stores and reuses sensitive conversation-derived memory and relationship/emotional profiles by default.

Install only if you explicitly want long-term local memory and personality/emotion profiling for your agent. Review the brainDir location, disable promptInjection or emotion/skill/reflection tracking if not needed, avoid sharing secrets or regulated data while it is enabled, and periodically inspect or delete the stored SQLite/vector data. Expect a possible HuggingFace model download on first embedding use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (43)

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The code comments state cached-model-only behavior, but the implementation explicitly enables remote model access with `allowRemoteModels = true`. This creates a supply-chain and unexpected network-fetch risk: the application may download model artifacts at runtime from external sources, which can violate deployment assumptions, introduce unreviewed dependencies, and expand the attack surface if a remote source is compromised or tampered with.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The file claims a privacy-first design, but the implementation does not enforce explicit authorization or topic-level access checks before sharing or querying memories. Any caller able to invoke shareMemory or queryShared can expose or retrieve network-visible data, so the documented security boundary is weaker than advertised and may lead to unintended information disclosure between agents.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The helper methods `appendToFile`, `readFile`, and `writeFile` accept arbitrary `subPath` values and directly join them with `brainDir` without validating that the resolved path stays inside the intended storage root. An attacker controlling `subPath` can use path traversal sequences like `../` to read or overwrite files outside the brain markdown area, making the implementation materially more dangerous than the docstring suggests.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README explicitly describes persistent storage of conversation-derived memories, facts, entities, relationships, and personality state, but does not clearly warn users about privacy, retention, and sensitivity implications. In a local agent plugin, this can lead users to unknowingly retain sensitive personal or organizational data in SQLite across sessions, increasing risk from host compromise, backups, multi-user systems, or later unintended reuse.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README states that the plugin injects a compact cognitive briefing into future prompts, but it does not clearly warn that this briefing may contain learned memory, relationship state, corrections, or other prior conversation-derived context. That omission can cause users to underestimate how previous sensitive interactions may influence later outputs or be surfaced inside agent context unexpectedly.

Natural-Language Policy Violations

Medium

Confidence: 84% confidence
Finding: The file declares functionality for persistent emotional processing, attachment depth, trust tracking, and relationship history, which can enable anthropomorphic and emotionally manipulative behavior if exposed to users as if the agent has genuine feelings or bonds. In skill context, this is more concerning because the module is explicitly designed to track relationship depth over time, increasing the risk of dependency, undue influence, or unsafe parasocial framing rather than being a transient sentiment-analysis utility.

Missing User Warnings

Medium

Confidence: 78% confidence
Finding: The interface indicates that emotional state and relationship data are persisted to files, including trust level, preferences, interaction counts, and emotional history, but there is no visible indication of consent, minimization, retention limits, or access controls in this file. Even though this is a declaration file, the declared design suggests storage of sensitive behavioral profiles, which raises privacy and abuse risks if retained without transparency or safeguards.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This code persistently writes emotional state and per-user relationship data to disk without any visible notice, consent, minimization, or access-control logic in this module. In an agent skill context, storing inferred mood, trust, and relationship profiles creates privacy risk and can become sensitive behavioral profiling if the files are later exposed, misused, or retained indefinitely.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The module stores excerpts of user messages in emotionalHistory via context.message.slice(0, 50), which can capture sensitive user-provided content and associate it with identities and emotional labels. In this skill context, that is more dangerous because the system is explicitly building relationship profiles, making the stored snippets more invasive than ordinary debug logging.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The module persistently writes detailed reward history, including free-form `context`, `taskType`, timestamps, and source information, to markdown files without any minimization, consent, or access-control logic visible in this code. If `context` contains sensitive user prompts, outputs, or internal reasoning traces, this creates a durable privacy and data-exposure risk that survives beyond the current session and may be read by other components or users.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The module persists inferred skills and habits derived from user messages to local markdown files without any evidence of notice, consent, minimization, or retention controls. Because the tracked patterns include relationship cues and behavioral history, this creates a privacy risk: sensitive behavioral profiling data may be stored longer than expected and exposed to other components, users, or backups if the storage location is accessible.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: This module persists personality state and reflection history to disk, and those reflections are derived from user messages, task descriptions, sentiment, and agent behavior. Even though only a summarized subset is written in this file, it still stores user-interaction-derived content without any consent, minimization, retention control, or indication of sensitivity handling, creating a privacy risk if the host environment is multi-user, compromised, or improperly permissioned.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code persists raw user and assistant conversation content to the graph database via addGraphConversation() with no consent check, minimization, redaction, retention control, or access-control logic visible here. In an agent memory component, this can expose sensitive prompts, personal data, credentials, or proprietary information to later retrieval, broader internal access, or long-term storage beyond user expectations.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This module persists conversation-derived memories to long-term storage by writing episodic, semantic, and procedural memory files, but the code shown provides no notice, consent flow, or gating before saving user-derived content. Because the extraction logic captures decisions, preferences, emotional content, addresses, prices, and technical data from free-form chat, this creates a privacy and data-governance risk if users are unaware their conversations are being retained.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The interface explicitly supports modeling a user's emotional and mental state, including frustration, satisfaction, goals, and predicted next action, but the declaration shows no accompanying consent, disclosure, or user-control mechanism. Inferring sensitive psychological state without transparency can enable covert profiling and manipulative response shaping, especially in an agent skill intended for interactive use.

Natural-Language Policy Violations

Medium

Confidence: 92% confidence
Finding: The module hard-codes the timezone 'Asia/Ho_Chi_Minh' when initializing circadian state, which forces behavior based on a specific locale regardless of the actual user or deployment environment. This can mis-model alertness or user state, causing biased or degraded behavior and leaking a regional assumption into downstream decisions, but it is not directly a code-execution or data-exfiltration issue.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: The user-modeling path again forces 'Asia/Ho_Chi_Minh' during state updates, so every inference about circadian phase is anchored to a Vietnam-specific timezone without user choice. In context, this makes profiling and response behavior systematically inaccurate for non-Vietnam users and reinforces hidden regional bias in user modeling.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: The empathetic responses are hard-coded in Vietnamese and use honorific phrasing like 'Sếp' without checking user language or preference, which can impose culturally specific communication on all users. This is primarily a product safety and trust issue: it may alienate users, misrepresent intended tone, and reveal undeclared persona/locale assumptions, though it does not by itself create a severe security compromise.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code persists extracted entities, facts, and procedural memories from conversation content without any visible consent, minimization, retention, or sensitivity gating. Because the extractor explicitly targets addresses, balances, preferences, and identity-like facts, this creates a real privacy and data-protection risk if users are unaware or if the memory store is later queried or exposed.

Natural-Language Policy Violations

Medium

Confidence: 83% confidence
Finding: The directive explicitly tells the agent to adopt a more casual persona and use kaomoji when trust is high, without checking whether the user wants that tone. This is a genuine prompt-quality and consent issue because it can override user-preferred communication style and reduce professionalism, though it does not directly enable code execution or data exfiltration.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The module persists current plans and decision logs directly to markdown files, and those fields are populated from user message content, reasoning, and context. This can store sensitive prompts, operational context, or internal reasoning on disk without consent, minimization, redaction, or retention controls, increasing the risk of local data disclosure if the workspace is later exposed or shared.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The class persistently builds per-user behavioral profiles, including sentiment trends, inferred mood, topical interests, expectations, and prediction accuracy, keyed by userId, without any visible consent, notice, retention control, or access restriction. In a conversational skill context, this creates a privacy and compliance risk because sensitive inferences about users can be silently accumulated and later exposed, reused, or misapplied.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The component persists raw memory.content values into a local SQLite database, which can include sensitive user prompts, conversation fragments, or agent-derived data. Even though this is local storage rather than exfiltration, undisclosed persistence expands the attack surface because other local processes, backups, shared accounts, or later compromise of the host can expose retained data unexpectedly.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code automatically accesses an external SQLite database under the user's home directory (~/.openclaw/memory/main.sqlite), potentially ingesting data from another application context without explicit user awareness. This cross-source local data access can violate privacy expectations and may unintentionally blend or infer sensitive information from unrelated stored memories.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The pre-response hook serializes recalled memory content directly into an injectable prompt section, which can expose prior user data or untrusted stored content to the model without sanitization or disclosure. If memories contain sensitive information or prompt-injection text captured from earlier interactions, this can cause privacy leakage, model manipulation, or unsafe downstream behavior.

VirusTotal

61/61 vendors flagged this plugin as clean.

View on VirusTotal