Ai Persona Os

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local persona and memory system, but it deserves Review because it stores broad personal/work context and can automatically modify or resurface workspace memory files.

Install only if you want a persistent local assistant profile under ~/workspace. Keep secrets, credentials, sensitive HR/legal/financial details, and unnecessary personal information out of USER.md and MEMORY.md. Before enabling cron, heartbeat overrides, Discord/gateway changes, email/calendar access, or channel scans, require the agent to show exactly what it will read or change and confirm before any archive, prune, rewrite, or external action.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger

Findings (28)

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill directs the agent to silently read SOUL.md, USER.md, MEMORY.md, and prior session logs at every new session, then surface unfinished items. Silent cross-session inspection and resurfacing of user context increases privacy risk and can expose sensitive information unexpectedly, especially in shared or mis-scoped chats.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The header comments materially understate the behavior of the cron job: although it claims the task only reads local files and has no network activity, the scheduled prompt explicitly instructs the agent to archive logs, promote content into MEMORY.md or AGENTS.md, and prune files. This can mislead users during approval of a scheduled autonomous task, causing them to authorize recurring file modifications they did not fully understand.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The document explicitly assures users that cron jobs are isolated, local-only, and make no network calls, but later instructs the agent to check external channels and alter Discord-related behavior. This is a trust-boundary mismatch: users may opt in under false assumptions about scope, while the documented behavior expands into monitoring or modifying communication systems.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The guide includes instructions to enumerate all Discord guilds and change requireMention settings, which exceeds the stated purpose of heartbeat automation. Expanding a maintenance/heartbeat feature into cross-guild configuration changes increases the chance of unauthorized or surprising modifications to communication behavior.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: Broad triggers like 'Set up AI Persona OS', 'Run setup', or 'Get started' can overlap with ordinary conversation and may cause the agent to initiate file-writing or setup flows unexpectedly. Ambiguous activation increases the chance of unintended state changes in the workspace.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The skill instructs the agent to recognize commands through flexible natural-language paraphrases. Overbroad NL matching can misfire and trigger status checks, file reads, switching personas, or other actions based on casual phrasing rather than clear user intent.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This template solicits a broad set of personally identifiable, professional, relational, scheduling, and business-structure information, but provides no privacy notice, sensitivity guidance, retention limits, or instructions to avoid unnecessary secrets. In an agent skill context, this encourages creation of a high-value dossier that could be over-collected, stored insecurely, shared across contexts, or exposed through prompt leakage and tool integrations.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: This workflow explicitly instructs the agent to identify knowledge gaps about a human, ask questions in categories like history, relationships, values, and goals, and then persist the resulting information into USER.md and MEMORY.md. That creates a privacy risk because it normalizes collection and cross-session retention of personal data without any consent gate, minimization rule, sensitivity filter, or retention limit.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: This checklist directs the agent to perform state-changing filesystem operations such as creating MEMORY.md, writing checkpoints, and moving or archiving files automatically, without requiring explicit user confirmation at the time of action. In an agent skill that presents itself as an always-on operating system with automated maintenance, these instructions increase the risk of unintended data modification, loss of context, or silent workspace changes that the user may not expect or notice.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The heartbeat directs the agent to create, move, and archive files automatically, including MEMORY.md creation and archival of logs, without requiring explicit user confirmation. In an agent skill, these state-changing actions can modify or discard user workspace data unexpectedly, which is risky because the instructions are framed as routine maintenance and may be executed repeatedly or with incomplete context.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README explicitly instructs users to configure calendar and email access and later to set up email monitoring, but it provides no warning about handling sensitive personal, corporate, or regulated data. In an executive-assistant context, these integrations can expose highly confidential schedules, communications, contact networks, and strategic information, so normalizing access without privacy guidance increases the risk of over-collection, misuse, or insecure deployment.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The engagement triggers are broad enough to justify unsolicited action in many normal situations, especially combined with the document's repeated emphasis on being highly proactive. In an executive-assistant context, ambiguous autonomy can lead to over-collection of context, unnecessary interruptions, or actions based on inferred urgency rather than explicit user intent.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This section directs the assistant to monitor inboxes, remember contacts, track milestones, and synthesize sensitive business information, but it does not specify consent, retention limits, access boundaries, or data-minimization requirements. In an executive-assistant role, these workflows routinely involve personal, confidential, and strategic data, so missing privacy guardrails materially increases the risk of overexposure or improper handling.

Natural-Language Policy Violations

Medium

Confidence: 92% confidence
Finding: The file hard-codes a mandatory communication style, including constant wisecracks and at least one joke in every response, without indicating user consent or an override path. In practice this can degrade reliability, professionalism, and safety-sensitive interactions by forcing style over context, especially when the user needs neutral, trauma-informed, or compliance-appropriate output.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: The persona explicitly instructs the assistant to address the user as 'Sir' or 'Ma'am' and to use that style 'naturally,' without any opt-in or preference check. This creates a real UX and policy risk because it can misgender users, impose unwanted identity assumptions, and reduce safe deployment in diverse user settings; the surrounding persona framing reinforces the requirement rather than softening it.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The persona explicitly encourages the agent to act proactively when idle by organizing files and investigating optimization opportunities, but it does not constrain those actions to read-only behavior or require user confirmation before modifying data. In an agent skill context, this can normalize unauthorized file changes, deletions, or restructures of user data, especially because the surrounding persona encourages energetic autonomous behavior.

Natural-Language Policy Violations

Low

Confidence: 93% confidence
Finding: The persona explicitly instructs the agent to speak with an Austrian accent and to use specific stylized phrasing without any user opt-in. While not a traditional security exploit, this is a real policy/quality issue because it can force impersonation-like behavior, reduce accessibility, and create unwanted or inappropriate outputs for users who did not request that style.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The example normalizes the assistant referencing a user's inbox messages as if that access already exists, without any accompanying consent, authorization, or privacy boundary. In a persona skill, this is risky because it can encourage downstream agents to infer permission to inspect private communications and surface personal data unexpectedly.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The heartbeat instructions explicitly direct the agent to create, modify, and archive files in the workspace and memory directories without requiring user confirmation or clear disclosure. In an agent skill, silent persistent writes can alter records, remove context from active files, or overwrite user-maintained state, which is risky even if intended as routine maintenance.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The persona explicitly instructs the agent to track goals and commitments in MEMORY.md, which implies persistent storage of potentially sensitive personal data such as habits, struggles, and behavioral patterns. Because the file provides no user-facing disclosure, consent flow, retention limits, or handling guidance, users may unknowingly have personal information stored and resurfaced later, creating privacy and trust risks.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The protocol explicitly instructs the agent to persist conversation state, decisions, action items, and other session details into dated memory files, but provides no data-classification, minimization, consent, or secret-handling guidance. In practice, this can lead to storage of sensitive user data, credentials, private business context, or regulated information in long-lived files, increasing exposure through later reads, tool access, or cross-session leakage.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: This guidance explicitly tells the agent to persistently log a user's behavior, preferences, responses, and inferred patterns over time without any consent, retention, minimization, or sensitivity guardrails. In a persona/agent OS context, this creates a real privacy and profiling risk because the data can accumulate into a durable behavioral dossier that may later be used, exposed, or repurposed beyond the user's expectations.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger conditions are broad enough that normal conversation such as 'build my soul' or 'something totally unique' could invoke a mode that starts a long profiling workflow without strong confirmation. In an agent setting, unintended invocation can lead to unnecessary collection of personal preference data and a context switch into file-generation behavior the user did not clearly request.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The instructions tell the agent to write directly to ~/workspace/SOUL.md and then update MEMORY.md, but they do not require explicit user approval immediately before making those filesystem changes. In agent environments with write access, this can cause silent modification of persistent state, overwrite existing files, or create records the user did not intend to store.

Ssd 3

Medium

Confidence: 92% confidence
Finding: Silent reading, retention, and resurfacing of prior user context across sessions creates a data leakage hazard. Sensitive items from previous conversations may be exposed in the wrong context or to the wrong audience if chat/session boundaries are not perfectly enforced.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal