AI Persona Engine

Security checks across malware telemetry and agentic risk

Overview

This is a prompt-design skill, but it also describes storing call-derived audit data and applying runtime prompt patches without enough consent, retention, or approval controls.

Review carefully before using this skill in a live voice or chat system. It is not executable malware, but do not implement the learning loop with real calls unless transcript collection is consented, sensitive content is minimized or redacted, retention and access controls are defined, and prompt patches require human approval before affecting runtime personas.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (4)

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill goes beyond persona-generation guidance and introduces persistent storage of call audits plus a feedback loop for autonomous prompt evolution. That creates an unbounded data-retention and self-modifying behavior surface not reflected in the stated purpose, which can enable privacy issues, prompt drift, and unsafe behavior changes over time if transcripts or patches are stored and applied without strict controls.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The described learning loop performs pattern analysis, generates prompt patches, and deploys them as runtime overrides without redeploy, which is effectively a self-modifying prompt system. In a conversational agent, this can silently change behavior, bypass normal review processes, and amplify harmful or manipulative traits based on noisy interaction data.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: This design documents a pipeline that persists call-derived audit data and feeds periodic prompt evolution into runtime behavior via database-stored patches. That creates a self-modifying prompt supply chain: if transcripts, audit logic, or the patch store are poisoned, the live persona can be altered without code review, enabling prompt injection persistence, unsafe behavioral drift, or policy bypass at runtime.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The prompt evolution flow explicitly generates refinements, validates them with simulated calls, and then writes active patches that the persona engine reads at runtime. In a persona-authoring skill, this is unusually dangerous because it introduces autonomous prompt mutation from behavioral data, creating an attack surface for data poisoning, malicious patch insertion, and gradual degradation of safety constraints that may not be caught by narrow validation metrics.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal