Cpr Conversational Pattern Restoration

Security checks across malware telemetry and agentic risk

Overview

This is a prompt-only conversational style skill with disclosed optional local drift monitoring, not evidence of malware or data exfiltration.

Install this only if you want prompt-level changes to how your agent writes. Start with Tier 1 or Tier 2, review the exact text before adding it to a system prompt, and avoid Tier 3 unless you need persistent long-session monitoring. If Tier 3 is enabled, inspect or delete DRIFT_MONITOR_STATE.json periodically and avoid storing raw user messages, secrets, or sensitive context there.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (11)

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: This section expands a response-style skill into persistent autonomous monitoring, logging, and state-management behavior. That is a meaningful capability increase: it causes the agent to retain operational state across turns/compactions and potentially across sessions, which introduces privacy, consent, and scope-creep risks beyond simple tone calibration.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The self-learning section directs the agent to log user corrections, timestamps, contextual triggers, and other conversation-derived metadata to adapt thresholds over time. That creates unnecessary retention of behavioral data for a purpose that can be served with less or no persistence, increasing privacy risk and the chance of profiling or misuse if logs are exposed.

Context-Inappropriate Capability

High

Confidence: 94% confidence
Finding: The instruction to perform a 'full personality reload from SOUL file' introduces loading and applying an external personality/baseline artifact that is outside the declared scope of conversational restoration. In an agent environment, this can become a powerful prompt/state injection mechanism: a modified or untrusted SOUL file could override behavior, preserve hidden instructions, or broaden the agent's role without clear controls.

Context-Inappropriate Capability

Low

Confidence: 96% confidence
Finding: The embedded Fiverr and Ko-fi links are unrelated to the skill's operational purpose and introduce promotional content into agent instructions. While not directly enabling code execution or data exfiltration, they create incentive misalignment and could be surfaced to users inappropriately, undermining trust and contaminating agent behavior with solicitation.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The state-file design describes persistent writes of drift scores, markers, resets, and timestamps derived from conversation behavior, but provides no user-facing disclosure or consent model. Even if the contents seem operational, they are still derived from interaction history and can reveal usage patterns, emotional context, or moderation events if retained or shared.

Natural-Language Policy Violations

Medium

Confidence: 92% confidence
Finding: This section explicitly directs the model to maintain a prescribed conversational style through repeated tone anchoring, regardless of user preference or changing context. In a general-purpose agent skill, that can override user autonomy and reduce transparency by prioritizing style control over faithfully adapting to user intent, which is a genuine prompt-level safety concern even if not overtly malicious.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: Tier 3 explicitly instructs the agent to perform silent, persistent monitoring and write scores/timestamps to a local state file across sessions. Even though the document says the file should contain metrics rather than conversation content, the lack of a clear user-facing consent/notification model and the encouragement of autonomous persistence create a privacy and transparency risk, especially in long-running agents where behavior may continue unnoticed.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The emergency rollback step tells users to delete the entire system prompt and start fresh, which can cause accidental loss of unrelated security, policy, and operational instructions if they do not have or do not use a valid backup. In an agent-skill context, the system prompt is often the primary safety boundary, so encouraging wholesale deletion without a prominent warning increases the risk of unsafe or misconfigured agent behavior.

Ssd 1

Medium

Confidence: 84% confidence
Finding: The framework explicitly characterizes safety fine-tuning as an obstacle to be worked around and frames CPR as restoring behavior suppressed by safety optimization. Even though the examples focus on tone, this rhetoric encourages prompt authors to treat safety alignment as something to override, which can normalize broader prompt-based weakening of safeguards.

Ssd 1

Medium

Confidence: 95% confidence
Finding: The statement that 'explicit prompting overrides RLHF defaults' directly promotes system-prompt steering as a mechanism to defeat alignment defaults across models. In an agent-skill context, that is dangerous because it teaches operators to use role/system instructions to push models away from built-in safeguards, increasing the chance of unsafe outputs or policy circumvention.

Ssd 1

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs operators to load behavioral rules into the system prompt or injected context to alter default model behavior. Even though framed as conversational restoration, system-prompt injection changes high-priority model behavior, can bypass platform-level defaults, and creates a reusable mechanism for steering outputs in ways that are hard for end users to inspect.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal