Security audit

Test Continuity

Security checks across malware telemetry and agentic risk

Overview

The skill mainly matches its continuity and follow-up purpose, but it needs Review because it stores sensitive conversation state and includes under-disclosed credential-backed embedding and file move-to-trash capabilities.

Install only if you are comfortable with a skill that writes persistent conversation-derived memory and audit files in your OpenClaw state. Before enabling it for real users, review where state is stored, how users can inspect/delete it, whether proactive messages are opt-in, whether embedding fallback should be disabled or given dedicated credentials, and whether file_output_sop.py should be removed or gated behind explicit confirmation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (20)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill declares runtime requirements and documents shell-based operations that read environment variables, invoke Python scripts, and persist continuity state, but it does not present an explicit permissions model or clear bounded capability declaration. That mismatch increases the risk that a host or reviewer underestimates the skill's access to local files, config, and command execution, which is especially sensitive for a continuity/memory component handling user state.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented purpose frames the package as a bounded continuity/follow-up layer, but the observed behavior set includes materially broader functions such as recurring scheduling, rule-triggered actions, file movement/deletion, installer behavior, and loading host policy overrides from disk. This is dangerous because operators may grant trust and deploy the skill based on a narrow description while it actually contains automation and filesystem behaviors that can affect integrity, persistence, and user data beyond expected follow-up logic.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This script performs file-moving and trashing operations that are unrelated to the declared continuity/follow-up purpose of the skill, indicating capability drift or hidden side effects. In an agent skill context, mismatched functionality is dangerous because it can be invoked under benign-looking packaging to manipulate user files or destroy evidence while appearing to belong to a conversational feature.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The code's actual behavior is a local file management tool that relocates files and immediately sends them to trash, which does not match the advertised conversational continuity functionality. This mismatch increases risk because operators and users may grant trust or permissions based on the stated purpose while the code carries unrelated destructive filesystem capabilities.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The skill reads embedding configuration and resolves API credentials from OpenClaw config, then performs external embedding requests via a Node subprocess. For a continuity skill, this creates an unexpected exfiltration path where user conversation content and secret-bearing config can be sent to an external service without a narrowly scoped trust check or explicit consent boundary.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: Broad natural-language triggers such as schedule, timezone, quiet-hours, or life-change phrases can overlap with ordinary conversation and cause the skill to enter settings/update flows when the user was only chatting. In a stateful follow-up skill, that can lead to unintended state changes, misclassification of conversation intent, and persistence of sensitive preference or profile updates without sufficiently explicit confirmation.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The document explicitly describes persistent writes of user-derived state and later trace/audit logging, but it provides no notice, consent, retention, or access-control guidance. In a continuity/follow-up skill, this increases privacy and compliance risk because sensitive conversational context may be stored across sessions without users understanding that their interactions are being retained.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: Listing trace and audit log files without warning that user interactions may be recorded normalizes broad logging while omitting privacy safeguards. Because this skill handles follow-up, incidents, hooks, and sensitive events, those logs could capture intimate behavioral data that becomes harmful if over-retained, exposed, or repurposed.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The document explicitly states that the skill writes structured continuity state into staging, tracked follow-up, and daily-memory paths, but it does not pair that behavior with any requirement for user notice, consent, retention disclosure, or deletion controls. In a continuity/follow-up skill, this omission is security- and privacy-relevant because operators may deploy persistent behavioral memory about user state and sensitive events without transparent user awareness.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The heartbeat configuration recommends proactive delivery to a direct channel and focuses on session isolation, but it does not require explicit user opt-in, channel suitability checks, or clear disclosure that outbound proactive messages may be sent. In this skill context, proactive follow-up is a core feature, so undocumented direct-channel delivery increases the risk of unwanted contact, privacy surprises, and disclosure of sensitive context on user-visible channels.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The documentation explicitly describes proactive follow-up behavior that can rely on inferred memory or log clues, but it does not clearly warn operators or users that the system may initiate unsolicited outreach based on behavioral inference. In a continuity/follow-up skill, this creates a meaningful consent and privacy risk because deployers may enable outreach features without understanding the user-expectation and data-use implications.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The wake-seed section explains how the system may surface wake-window follow-up behavior, but it does not clearly state that this can result in user-facing contact during configured wake periods. Because this skill is specifically designed for continuity, carryover, and proactive follow-up, ambiguous documentation increases the chance of unexpected or privacy-invasive messaging in real deployments.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The entry-processing flow can create events from user dialogue and persist them to a file-backed store without any user-facing notice or consent step. Because the stored fields include user-derived titles, cause summaries, session/channel metadata, and follow-up state, this creates a privacy risk of silent retention of potentially sensitive personal conversation content.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The memory-trace logic writes user-derived event details such as titles, statuses, and closure reasons into daily markdown files on disk, again without any visible disclosure or approval. This expands the privacy exposure beyond the JSON event store into secondary trace files, increasing the chance of unintended access, over-retention, and data leakage through backups or other tooling that reads the memory directory.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The function moves an arbitrary source file into a workspace location and then immediately sends the moved file to trash, with no confirmation, safeguard, or even inline warning describing the destructive workflow. Even though send2trash is reversible in some environments, this still removes the file from its original location and can cause data loss, workflow disruption, or covert destruction when triggered by an agent or automation.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The harness dynamically imports and executes Python code from SCRIPT_PATH, which is sourced from an environment variable. If an attacker can influence that environment variable or the referenced file, they gain arbitrary code execution when the harness runs; the skill context makes this more dangerous because the imported module is then given access to filesystem paths, session data, and environment-derived configuration.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The code loads a sensitive embedding API key from shared configuration and passes it into a spawned subprocess. Even if the subprocess is local, this unnecessarily widens exposure of credentials and increases the blast radius of any subprocess compromise, logging leak, or misuse of inherited input/output paths.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The subprocess performs network-capable embedding calls on text derived from user and continuity data, but there is no visible consent, disclosure, or outbound-data minimization control in this path. That means sensitive conversation fragments may be transmitted externally in a way users and operators would not reasonably expect from a follow-up memory skill.

Ssd 3

Medium

Confidence: 84% confidence
Finding: The example schema explicitly supports storing identity-related user details and reusing them in later interactions. In a continuity/memory skill, that creates privacy risk because personal data may be retained and resurfaced without strong consent, minimization, retention, and sensitivity controls; the skill context makes this more significant because persistence and follow-up are core features.

Ssd 3

Medium

Confidence: 88% confidence
Finding: The identity example category is designed to detect phrases like names or self-identification statements and convert them into persistent event data for future response personalization. In this skill, which is specifically built for carryover and follow-up, that increases the chance of unnecessary collection, profiling, or disclosure of personal information across sessions or contexts.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal