SoulForge

Security checks across malware telemetry and agentic risk

Overview

SoulForge appears local and purpose-aligned, but it stores a persistent behavioral profile and can change the agent’s long-term SOUL.md behavior with under-scoped controls.

Install only if you are comfortable with local cross-session behavioral profiling. Disable automatic observation if you want manual-only use, avoid --auto-accept, review proposed SOUL.md diffs carefully, and periodically inspect or delete memory/observations.json and backups.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (11)

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The file-level documentation promises that writing to SOUL.md requires explicit approval, but the --auto-accept path bypasses interactive confirmation and writes approved high-confidence proposals directly. In an agent skill context, this weakens a stated safety boundary and can lead to unintended modification of user-controlled prompt/configuration files if the flag is invoked by another tool, wrapper, or automation layer.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The security manifest states that SOUL.md is written only with explicit user approval, but the implementation allows automatic modification via --auto-accept. A false or overstated manifest is itself security-relevant because downstream agents, reviewers, or orchestration systems may trust the declared write constraints and permit execution under assumptions that are not actually enforced.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The README advertises very broad natural-language triggers such as "Update my soul," "Run soulforge," and "Soul check," which could overlap with ordinary user conversation and cause unintended activation. In a skill that performs persistent observation and proposes edits to identity-related files, accidental invocation increases the chance of privacy-impacting collection or confusing state changes.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README describes a passive background mode that accumulates behavioral signals across sessions, but it does not present a prominent up-front privacy warning, retention policy, or explicit consent flow in the documentation shown. Because the data concerns user behavior, recurring phrases, and values inferred over time, users may enable it without understanding the sensitivity or persistence of what is being collected.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The description embeds broad natural-language triggers such as 'reflect on me' and 'what patterns do you notice', plus an automatic run every 10 sessions. In an agent environment, vague trigger conditions can cause unintended invocation during ordinary conversation, leading the skill to inspect session history or propose identity-file changes when the user did not clearly intend that action.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The trigger list contains many ambiguous phrases without scope guards, and it is paired with automatic execution. Because these phrases are common reflective language, the skill may activate from conversational text not meant as a command, causing unnecessary analysis of historical user content and possibly surfacing sensitive inferences.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The README explicitly states that the skill watches the user across sessions and tracks real decisions, recurring phrases, values, and blind spots to evolve a persistent profile. Even if processing is local, this creates a durable behavioral dossier that can surface sensitive personal information through summaries, prompts, backups, or later misuse by other local components.

Ssd 3

Medium

Confidence: 95% confidence
Finding: Describing Observe mode as passive and running in the background indicates ongoing capture of session-derived behavioral data without a strong indication of per-session consent or active visibility. Background collection is especially risky because users may forget it is enabled while discussing sensitive matters, leading to unintended long-term storage and later exposure in reflections or derived files.

Ssd 3

Medium

Confidence: 93% confidence
Finding: The tracked signals include vocabulary, tone, decisiveness, topic gravity, and aspiration gaps, all of which can reveal sensitive traits, beliefs, mental state, or recurring private concerns when stored and summarized over time. Such profiling increases privacy risk even without network exfiltration because local summaries, logs, backups, or unintended downstream use can still leak intimate behavioral information.

Ssd 3

Medium

Confidence: 93% confidence
Finding: This section explicitly instructs the system to observe recurring phrases, topics, emotional register, decisions, and other cross-session behavioral signals, then summarize them back to the user. Even without external exfiltration, this creates a mechanism for collecting and re-presenting sensitive personal data and inferred traits that may exceed user expectations or reveal intimate patterns in generated outputs.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The skill describes persistent logging of vocabulary, topic gravity, decision style, tone fingerprint, blindspots, and time-based behavior, all tied to a long-lived identity artifact and backups. This amounts to a durable behavioral profile across sessions, which is sensitive by nature and increases privacy risk if exposed, misused, or surfaced inappropriately in future outputs.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal