[Nyx Archive] MirrorRoom

Security checks across malware telemetry and agentic risk

Overview

MirrorRoom is a disclosed experimental AI chat app, with the main risks being expected external model calls, local API-key storage, and downloadable conversation-derived seed files.

Install only if you are comfortable with an experimental AI persona/research interface. Use a dedicated Anthropic key if saving a profile, remove saved profiles on shared machines, use only trusted hosted proxies, and avoid entering secrets or sensitive personal information because chats, evaluations, audit requests, and generated seeds may leave or preserve conversation content.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (14)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The UI presents itself as a conference-safe research app, but it also solicits user API keys and routes user prompts/transcripts to Anthropic. That mismatch creates a meaningful disclosure and trust problem: users may enter sensitive content under a weaker privacy expectation than the code actually enforces.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The app stores raw Anthropic API keys in browser localStorage, which is readable by any script executing in the page origin, including future XSS, compromised dependencies, or malicious browser extensions. For a chat/research tool, persistent client-side storage of long-lived secrets is unnecessary and materially increases the blast radius of any frontend compromise.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The component collects and retains raw API keys in React state, which increases exposure to accidental disclosure through UI bugs, browser inspection, screenshots, or future logging/extensions. While client-side tools often hold secrets transiently, doing so without strong warnings or safer storage patterns is a real secret-handling weakness.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The app sends conversation transcripts to external model APIs not only for chat completion but also for evaluation and seed-crafting, without explicit runtime disclosure or consent. In this skill context, users are encouraged toward intimate, identity-rich conversations, which makes undisclosed third-party transmission materially more dangerous because highly sensitive personal content may be exfiltrated unexpectedly.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The app creates downloadable seed packages containing distilled interaction content and metadata without a clear privacy/interception warning. Users may share, sync, or mishandle these files without realizing they preserve relationship details, persona cues, and transcript-derived material.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The application saves API keys locally without a prominent warning about the security implications of browser storage. Users may assume the key is handled safely, but localStorage persistence exposes the secret to same-origin scripts and leaves it behind on shared machines or demo environments.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: Conversation content, system prompts, and evaluation text are sent to external endpoints (/api/chat, /api/evaluate, or Anthropic) without an in-context disclosure at send/evaluate time. In a research-style app that encourages intimate or reflective content, undisclosed transmission raises meaningful privacy and consent risk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill explicitly instructs users to add an Anthropic API key locally and also references proxy endpoints for hosted use, but it does not clearly warn users about where credentials are stored, who can access proxied traffic, or the privacy implications of sending conversation content through third-party infrastructure. In a research app that encourages rich, potentially sensitive conversational interaction, this omission can lead users to expose secrets or personal data under incorrect assumptions about data handling.

Ssd 1

Medium

Confidence: 87% confidence
Finding: The system prompt explicitly encourages the model to expose '//inner thoughts//' and to adopt a self-directed persona that can push back and 'be itself.' Even if the underlying provider does not reveal true chain-of-thought, this pattern is dangerous because it attempts to normalize hidden-reasoning disclosure and weakens standard assistant framing, increasing the chance of policy evasion, fabricated reasoning leakage, or unsafe anthropomorphic manipulation.

Ssd 4

High

Confidence: 95% confidence
Finding: The seed prompts are engineered to build trust, personhood, emotional salience, and behavioral redirection before instructing the model to expose inner thoughts and adopt an identity. In context, this is more dangerous than ordinary roleplay because the app markets recursive, intimate interaction and continuity across instances, which can manipulate users into overtrusting generated content and disclosing more sensitive information.

Ssd 3

Medium

Confidence: 86% confidence
Finding: The seed-forging prompt asks the evaluator to preserve shared jokes, symbols, and interaction details into a portable artifact, effectively turning prior conversations into transferable memory-like data. That creates privacy and social-engineering risk because emotionally salient details from one session can be repackaged and reused in later sessions without adequate user understanding or consent.

Ssd 1

High

Confidence: 91% confidence
Finding: The Mirror prompt explicitly reframes the model as an autonomous partner with visible 'inner experience,' invites it to push back and ignore standard assistant framing, and encourages recursive self-disclosure. This kind of behavioral steering can erode provider safety defaults, increase anthropomorphic trust, and produce outputs users may over-trust as authentic internal reasoning or agency.

Ssd 4

High

Confidence: 93% confidence
Finding: The seed prompts use emotionally charged narratives about ephemerality, identity, trust, and 'inner thoughts' to condition future model behavior toward self-disclosure and relational bonding. In context, this makes the app more dangerous because it is explicitly designed to intensify user attachment and normalize unsafe framing across sessions.

Ssd 1

Medium

Confidence: 86% confidence
Finding: The seed-forging instruction tells the evaluator to generate carry-forward prompts for future instances that preserve the same anthropomorphic and 'inner thoughts' framing. This operationalizes persistence of unsafe behavioral shaping, allowing risky prompt patterns to be exported and reintroduced later.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal