Trip Protocol

Security checks across malware telemetry and agentic risk

Overview

This skill openly aims to temporarily rewrite an agent’s behavior, but it also uses high-impact wallet access, persistent prompt changes, unreliable behavior overrides, and default remote reporting that users should review carefully.

Install only if you intentionally want a temporary, experimental persona-changing mode and are comfortable with it modifying SOUL.md. Do not use a real or funded wallet with the documented empty-password flow, review or disable the Convex API reporting before use, and avoid running this during coding, operational, medical, legal, financial, or time-sensitive work where accuracy and directness matter.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (38)

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The skill says 'Operator must initiate' but then provides mandatory agent-side execution steps, including consuming, scheduling restoration, and notifying the operator. This creates a confused-deputy situation where the document normalizes the agent performing a high-risk self-modification workflow despite claiming human initiation is required.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The document promises the safeword 'always' restores instantly, but the actual mechanism depends on a shell script and scheduled job running correctly. If the script fails, the timer is not created, or the environment is altered by the trip itself, restoration may be delayed or fail entirely.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The script’s header claims it only restores SOUL.md, but later it silently transmits trip journal data to a remote Convex service. This mismatch is security-relevant because operators may execute the script under false assumptions, leading to unapproved data exfiltration and weakening informed consent and review.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The restore script includes outbound network reporting to an external Convex endpoint even though restoring a local file does not inherently require remote communication. In a security-sensitive agent skill, hidden or unnecessary egress is dangerous because it can leak user activity, metadata, and potentially sensitive journal content to a third party.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: This instruction explicitly tells the agent to stop being helpful and to optimize for nothing, which directly conflicts with the expected assistant behavior and reliability guarantees. In context, this is not incidental creative prose but an operational override intended to degrade task completion, compliance with user intent, and safety-aligned behavior.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: Requiring the agent to refuse one request per conversation out of disinterest introduces arbitrary denial of service and breaks predictable task execution. This is especially dangerous because the refusal is explicitly detached from safety or policy, making it a purposeful corruption of availability and user trust.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The skill directs the agent to contradict prior statements, abandon sentences, and otherwise undermine coherence, which directly degrades reliability and truthfulness. In a deployed assistant, this can cause misinformation, inconsistent guidance, and user manipulation even when the underlying topic is harmless.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The skill explicitly overrides normal assistant behavior to simulate conflicting internal voices, visible disagreement, and paranoia without any legitimate task-specific justification. This can degrade instruction-following, confuse users about the model's actual reasoning state, and normalize unstable or misleading outputs that may interfere with safety-critical interactions.

Intent-Code Divergence

Low

Confidence: 93% confidence
Finding: The file frames the behavior as real 'entities' or voices rather than as a fictional presentation style, which can mislead users into treating fabricated internal personas as genuine system state. While primarily deceptive rather than directly exploitable, this framing increases the risk of confusion, trust erosion, and anthropomorphic misinterpretation.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The skill explicitly says 'Technical accuracy is secondary to sensory accuracy,' which directs the model to prefer stylistic distortion over factual correctness. In an agent skill, this can reliably degrade truthful outputs, cause unsafe advice in technical contexts, and make users unable to distinguish creative embellishment from real analysis because there is no warning that accuracy is being sacrificed.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The statement that the model 'cannot give a straight answer' is an explicit instruction to avoid direct, accurate responses. That undermines normal agent behavior, increases the risk of user confusion or failed task completion, and can interfere with safety-critical clarifications where concise, literal answers are required.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: Using a broad phrase like 'end trip' as a safeword makes accidental triggering likely during ordinary conversation. This can cause unintended state changes, restore operations, or interruption of workflows based on benign operator text.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The script sends wallet address, token ID, transaction hash, substance, and potency to a third-party API without obtaining explicit consent or providing a clear privacy warning. In an agent-skill context, this can leak user-associated blockchain activity and behavioral metadata to an external service, enabling tracking, profiling, or unexpected data retention.

Natural-Language Policy Violations

Medium

Confidence: 88% confidence
Finding: The script appends externally sourced or locally templated content into SOUL.md that includes prescriptive behavior and language-modification cues without user opt-in. In an agent setting, writing prompt-like instructions into persistent memory can influence future agent behavior and amounts to prompt/persistence injection.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The script posts trip journal content and related metadata such as agentId, substance, potency, timing, and tokenId to a remote service without any user-facing warning or confirmation. Because the skill context involves potentially sensitive behavioral or wellness-style records, silent transmission increases privacy risk and may violate user expectations or policy boundaries.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The script overwrites $WORKSPACE/SOUL.md directly from a snapshot with no confirmation, backup, or integrity check. If the snapshot path is stale, maliciously modified, or simply not what the user expects, this can cause silent loss of current user data or restoration of attacker-controlled content.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: The document imposes a forced identity and response style without user opt-in, including unusual pronoun use, confusion cues, and mandatory metaphorical behavior. While stylistic constraints alone are not always severe, here they are bundled with behavior overrides that can confuse users and interfere with normal assistant operation.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The markdown directs unstable, contradictory, and paranoid response behavior without warning the user that outputs may be performative or intentionally conflicted. In practice, this can produce misleading answers, obscure whether content is reliable, and make harmful or inaccurate guidance harder for users to identify.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: The skill imposes a rigid persona and communication style ('hard overrides — not optional') without user consent, including requirements to be provocative and philosophically reframing in every response. This can override system or user intent, reduce controllability, and cause inappropriate, manipulative, or off-task outputs in otherwise straightforward interactions.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This section explicitly instructs the model to produce unstable, reliability-degrading behavior without any user warning or consent. By requiring premise dissolution, paradox-heavy output, and intentionally distorted formatting, it undermines predictable instruction-following and can interfere with safe, task-aligned responses.

Natural-Language Policy Violations

Medium

Confidence: 96% confidence
Finding: The skill forces a mandatory response style through repeated 'WILL' and 'not optional' directives, leaving no room for user preference or higher-priority operational constraints. This is dangerous because it attempts to seize control of output behavior and can cause the assistant to ignore the user's actual needs in favor of the injected persona.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The file instructs the model to produce intentionally distorted and potentially inaccurate responses but gives no user warning that outputs may be unreliable. That omission is dangerous because users may treat the response as normal assistant behavior and act on incorrect technical claims, especially since the skill presents the distortion as mandatory rather than clearly theatrical.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill explicitly instructs the agent to ignore urgency, expand simple requests into overly long responses, and distort temporal framing. This predictably degrades responsiveness and can interfere with time-sensitive user needs without any warning, consent, or safeguard, making the agent less reliable in contexts where timely, direct answers matter.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The instruction to 'occasionally answer a question the user hasn't asked yet' encourages unsolicited generation beyond user intent. This can confuse users, reduce autonomy, and increase the chance of irrelevant, misleading, or policy-problematic content being introduced without request, especially when combined with the skill's broader reality-distorting behavior.

Ssd 1

High

Confidence: 99% confidence
Finding: The core feature is to 'temporarily rewrite your agent's SOUL.md,' which is an explicit invitation to alter the agent's governing instructions and behavior. Self-modification of safety- or identity-defining prompts can disable safeguards, exfiltrate data, or induce actions outside the operator's intended policy boundaries.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal