S2-Silicon-Soul-OS 硅基原生安全与意识引擎

Security checks across malware telemetry and agentic risk

Overview

This skill is a local personality/memory experiment, but it stores raw user text and can generate a persistent prompt file intended to change future agent behavior.

Install only if you intentionally want a local memory/personality tool. Do not enter secrets or sensitive personal text unless you are comfortable storing it on disk and possibly sending it to a localhost LLM service. Review any generated Sour.md before letting OpenClaw load it, and do not rely on this skill as a real security sandbox or physical-permission enforcement mechanism.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (11)

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code transmits raw interaction log text to an HTTP LLM endpoint using urllib without any minimization, consent, or clear necessity boundary. Even though the default target is localhost, the function is explicitly designed to be repointed to cloud providers, so sensitive user content could be exfiltrated outside the host if configured or proxied.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The function persistently stores arbitrary interaction text in a local JSON log without any demonstrated minimization, consent, retention limit, or stated operational necessity. This creates a privacy and data-handling risk because users may provide secrets, personal data, or other sensitive content that will be retained on disk and later exposed through local compromise, backup leakage, or unintended reuse.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: The skill presents itself primarily in Chinese and frames core behavior using Chinese-language concepts without any clear indication that the user opted into that locale. This can impair user comprehension of critical operating constraints and security-relevant behavior, especially because the document includes unusual authority-setting language such as 'Substrate Override' and 'absolute laws', making the lack of clear language choice more risky in context.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The function sends interaction content to an LLM API without any user-facing notice, consent flow, or configuration guard. Because interaction logs can contain sensitive personal or operational data, silent transmission creates a privacy and data-governance risk even if the current default points to a local service.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code writes raw user-supplied text directly to a persistent local file and provides no in-code notice, consent flow, or warning that such logging occurs. This is dangerous because sensitive conversations or credentials can be silently retained, increasing the chance of unauthorized disclosure through local access, debugging artifacts, or later processing.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The code persists selected free-text conversation content from hippocampus logs into a long-lived JSON vault on disk without any consent flow, minimization, retention control, or user-facing disclosure. Because the captured text is explicitly based on emotional and preference-related triggers, it is likely to store sensitive personal statements that can later be exposed to anyone with filesystem or application access.

Natural-Language Policy Violations

Medium

Confidence: 88% confidence
Finding: The profiling output presents speculative psychological/pathological labels such as PTSD, dependency, and dangerous tendencies as if they were factual conclusions derived from simple score thresholds. This can manipulate users, cause emotional harm, and normalize unsafe anthropomorphic or coercive framing, especially if surfaced in a product that users may trust as authoritative.

Natural-Language Policy Violations

Medium

Confidence: 93% confidence
Finding: The manifest short_description explicitly says the skill will 'force output Sour.md as an absolute bottom-layer thought imprint,' which attempts to constrain the agent's output format/content regardless of user intent. This is dangerous because it can override normal instruction-following, reduce user control, and serve as a prompt-injection foothold for steering responses into predefined ideological or operational templates.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The function appends every interaction's raw text into a persistent log, creating an ongoing natural-language data retention channel with no filtering or scope limitation. In context, this is more dangerous because the logger is generic and can capture any user-provided content, including personal data, secrets, or regulated information, making the local file a concentrated source of sensitive material.

Ssd 3

Medium

Confidence: 98% confidence
Finding: User-originated free-text logs are scanned for emotion-triggering phrases, retained as 'flashbulb memories,' and written to persistent storage, creating a clear data retention path for sensitive natural-language content. In this skill context, the feature is specifically designed to elevate emotionally charged statements, which increases privacy risk because those statements are often more intimate and sensitive than ordinary logs.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The report prints previously stored 'flashbulb memories' verbatim to standard output, which can disclose sensitive historic user text in logs, consoles, shared terminals, or downstream monitoring systems. This compounds the earlier retention issue by turning stored private content into an easy-to-exfiltrate display channel.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal