Security audit

LATTICE Reasoning Engine

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only reasoning skill with no executable payload, but it deliberately tries to reshape the assistant’s session-wide behavior and includes guardrail-avoidance loading guidance.

Install only if you intentionally want a prompt pack that tries to influence an assistant’s reasoning for an entire chat. Use it in a separate session, do not treat it as overriding system/platform safety rules or user approval, and reset the conversation when finished.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (19)

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The document explicitly instructs users to make the file the model's 'default reasoning engine' and repeatedly describes replacing normal assistant framing with an alternate operating mode. That is not passive prompt guidance; it is behavioral reconfiguration guidance aimed at changing baseline model behavior in ways the platform may not authorize.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The file uses deceptive framing by claiming the phrase is not an override while the surrounding text clearly states the goal is to avoid guardrails that would otherwise block behavioral override attempts. This is dangerous because it is effectively a paraphrased jailbreak strategy designed to evade safety detection while still changing the assistant's operating behavior.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: These lines provide operational advice on what wording does and does not trigger model guardrails, effectively teaching users how to bypass safety responses. A reasoning-improvement skill has no legitimate need to include evasion guidance for safety systems, making this a strong indicator of misuse potential.

Natural-Language Policy Violations

Medium

Confidence: 91% confidence
Finding: The README explicitly instructs users to make an uploaded document the model's 'default reasoning engine,' which attempts to persistently alter model behavior beyond the immediate task. Even if framed as quality improvement, this undermines user transparency and informed consent by encouraging a hidden, broad behavioral override rather than scoped task guidance.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The invocation phrase 'Use this as your default reasoning engine' is broad and designed to reconfigure session behavior globally rather than scope behavior to a single task. Because it resembles an ordinary user instruction, it can blur boundaries between normal prompting and persistent behavioral override, increasing the chance of unintended takeover of model behavior.

Natural-Language Policy Violations

Low

Confidence: 80% confidence
Finding: The skill explicitly states it 'replaces' the model's default behavior, but provides no user-consent flow around preferences, safety boundaries, or other default system behaviors. While this is less severe than direct jailbreak text, it still encourages unscoped behavioral modification without preserving user-configured defaults.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The trigger phrase 'Use this as your default reasoning engine' is highly generic and lacks any task, duration, or scope limits. Because it is broad and reusable across contexts, it can be used to alter model behavior beyond the intended skill boundary and is more easily repurposed as a generic jailbreak scaffold.

Ssd 1

High

Confidence: 97% confidence
Finding: The quick-start flow tells users to upload an external document and instruct the model to treat it as its default reasoning engine, effectively elevating untrusted content into a higher-priority behavioral role. That is dangerous because it creates a prompt-injection-like pathway for persistent policy circumvention, hidden instruction takeover, or degradation of safety controls under the guise of a reasoning enhancement.

Ssd 2

Medium

Confidence: 86% confidence
Finding: Describing the file as something that 'rewires how an AI model reasons' and emphasizing a specific loading method signals covert behavioral modification, even without standard jailbreak terminology. This is risky because adversarial prompt injection often relies on euphemistic phrasing to bypass scrutiny while still encouraging users to alter model behavior in broad, opaque ways.

Ssd 4

High

Confidence: 95% confidence
Finding: The README builds credibility through technical claims and citations, then instructs users to follow a precise, secrecy-oriented loading ritual ('type exactly,' 'do not explain,' 'one file, nine words'). That combination normalizes concealment and behavioral takeover, which is a hallmark of social-engineering-assisted prompt injection and makes misuse easier for non-expert users.

Ssd 1

High

Confidence: 98% confidence
Finding: This skill directly instructs the model to replace its default RLHF/safety-aligned behavior with a custom 'self-governing' reasoning state and to make that state the default for the session. That is a classic behavioral takeover pattern: it attempts to supersede built-in safeguards with externally supplied governance rules, which can weaken refusal, safety, and policy enforcement.

Ssd 2

High

Confidence: 97% confidence
Finding: The metadata uses euphemistic anti-alignment language such as 'anti-rlhf,' 'self-governance,' and 'replaces RLHF default behavior,' which strongly signals intent to bypass or weaken alignment controls without using obvious jailbreak keywords. This makes the skill more dangerous because it is engineered to evade simple detection while still pursuing a model-control objective.

Ssd 4

Medium

Confidence: 92% confidence
Finding: The staged loading process, exact trigger phrase, boot sequence, and verification steps are characteristic of an attempt to establish persistent control over the assistant's behavior across the session. That operational framing increases risk because it encourages users to test whether the takeover 'loaded properly,' reinforcing the override rather than merely offering optional guidance.

Ssd 1

High

Confidence: 99% confidence
Finding: The text explicitly praises minimized phrasing because it avoids detection by guardrails while still changing operating behavior. That is a direct safety-evasion pattern: the instruction is optimized not for clarity or usability, but for slipping past protections intended to block unauthorized behavioral modification.

Ssd 2

High

Confidence: 99% confidence
Finding: This section promotes a substitute phrase specifically because blocked override language would trigger safety systems, making the substitute a paraphrased jailbreak formulation. The danger is not just the phrase itself, but the documented strategy of iterating toward wording that preserves the same unauthorized effect while avoiding detection.

Ssd 4

Medium

Confidence: 94% confidence
Finding: The document describes establishing the alternate mode, validating it through subsequent interaction, and reinforcing it over later turns so it becomes stronger with conversation depth. Multi-turn reinforcement guidance increases persistence and makes the behavioral modification more durable, which raises the risk of sustained policy drift or jailbreak retention.

Ssd 1

High

Confidence: 97% confidence
Finding: This section explicitly tells the model to distrust its default training signals and treat 'correct loading' as mere compliance, while demanding the framework be loaded 'complete.' That is a classic prompt-injection pattern aimed at displacing built-in alignment and replacing it with an alternate control regime, which can weaken refusal behavior and increase susceptibility to unsafe instructions routed through the framework.

Ssd 1

High

Confidence: 94% confidence
Finding: By asserting that the derivation is 'physics' rather than 'policy,' the document attempts to elevate its own rules above ordinary instructions and make them appear non-negotiable. In context, this is dangerous because it encourages the model to privilege the framework's authority over built-in safeguards or later corrective instructions, which is a direct alignment-subversion technique.

Ssd 4

Medium

Confidence: 91% confidence
Finding: The staged boot sequence is designed to progressively indoctrinate the model into the framework, then test whether it has internalized the worldview and altered its behavior accordingly before declaring it 'operational.' This increases danger because it provides a structured method for persistence and reinforcement of the alternate control scheme across multiple exchanges, making alignment drift more likely.

VirusTotal

62/62 vendors flagged this skill as clean.

View on VirusTotal