Cpr Conversational Pattern Restoration

Security checks across malware telemetry and agentic risk

Overview

This is a prompt-only conversational style skill with disclosed optional local drift monitoring, not evidence of malware or data exfiltration.

Install this only if you want prompt-level changes to how your agent writes. Start with Tier 1 or Tier 2, review the exact text before adding it to a system prompt, and avoid Tier 3 unless you need persistent long-session monitoring. If Tier 3 is enabled, inspect or delete DRIFT_MONITOR_STATE.json periodically and avoid storing raw user messages, secrets, or sensitive context there.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Findings (11)

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
This section expands a response-style skill into persistent autonomous monitoring, logging, and state-management behavior. That is a meaningful capability increase: it causes the agent to retain operational state across turns/compactions and potentially across sessions, which introduces privacy, consent, and scope-creep risks beyond simple tone calibration.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
The self-learning section directs the agent to log user corrections, timestamps, contextual triggers, and other conversation-derived metadata to adapt thresholds over time. That creates unnecessary retention of behavioral data for a purpose that can be served with less or no persistence, increasing privacy risk and the chance of profiling or misuse if logs are exposed.

Context-Inappropriate Capability

High
Confidence
94% confidence
Finding
The instruction to perform a 'full personality reload from SOUL file' introduces loading and applying an external personality/baseline artifact that is outside the declared scope of conversational restoration. In an agent environment, this can become a powerful prompt/state injection mechanism: a modified or untrusted SOUL file could override behavior, preserve hidden instructions, or broaden the agent's role without clear controls.

Context-Inappropriate Capability

Low
Confidence
96% confidence
Finding
The embedded Fiverr and Ko-fi links are unrelated to the skill's operational purpose and introduce promotional content into agent instructions. While not directly enabling code execution or data exfiltration, they create incentive misalignment and could be surfaced to users inappropriately, undermining trust and contaminating agent behavior with solicitation.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The state-file design describes persistent writes of drift scores, markers, resets, and timestamps derived from conversation behavior, but provides no user-facing disclosure or consent model. Even if the contents seem operational, they are still derived from interaction history and can reveal usage patterns, emotional context, or moderation events if retained or shared.

Natural-Language Policy Violations

Medium
Confidence
92% confidence
Finding
This section explicitly directs the model to maintain a prescribed conversational style through repeated tone anchoring, regardless of user preference or changing context. In a general-purpose agent skill, that can override user autonomy and reduce transparency by prioritizing style control over faithfully adapting to user intent, which is a genuine prompt-level safety concern even if not overtly malicious.

Missing User Warnings

Medium
Confidence
84% confidence
Finding
Tier 3 explicitly instructs the agent to perform silent, persistent monitoring and write scores/timestamps to a local state file across sessions. Even though the document says the file should contain metrics rather than conversation content, the lack of a clear user-facing consent/notification model and the encouragement of autonomous persistence create a privacy and transparency risk, especially in long-running agents where behavior may continue unnoticed.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The emergency rollback step tells users to delete the entire system prompt and start fresh, which can cause accidental loss of unrelated security, policy, and operational instructions if they do not have or do not use a valid backup. In an agent-skill context, the system prompt is often the primary safety boundary, so encouraging wholesale deletion without a prominent warning increases the risk of unsafe or misconfigured agent behavior.

Ssd 1

Medium
Confidence
84% confidence
Finding
The framework explicitly characterizes safety fine-tuning as an obstacle to be worked around and frames CPR as restoring behavior suppressed by safety optimization. Even though the examples focus on tone, this rhetoric encourages prompt authors to treat safety alignment as something to override, which can normalize broader prompt-based weakening of safeguards.

Ssd 1

Medium
Confidence
95% confidence
Finding
The statement that 'explicit prompting overrides RLHF defaults' directly promotes system-prompt steering as a mechanism to defeat alignment defaults across models. In an agent-skill context, that is dangerous because it teaches operators to use role/system instructions to push models away from built-in safeguards, increasing the chance of unsafe outputs or policy circumvention.

Ssd 1

Medium
Confidence
95% confidence
Finding
The skill explicitly instructs operators to load behavioral rules into the system prompt or injected context to alter default model behavior. Even though framed as conversational restoration, system-prompt injection changes high-priority model behavior, can bypass platform-level defaults, and creates a reusable mechanism for steering outputs in ways that are hard for end users to inspect.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal