Security audit

zhangdi-avatar

Security checks across malware telemetry and agentic risk

Overview

This persona skill has no executable payload, but it asks the agent to impersonate a real person, retain private communications, and rewrite its own governing files.

Review before installing. Use it only if you intentionally want a persistent Zhang Di persona and you control the storage. Disable or manually gate writeback to SKILL.md, meta.json, and reference files, require explicit consent before saving conversations or source communications, and avoid adding raw private chats, emails, workplace messages, or photos unless they are redacted and authorized.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (14)

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs the agent to append to persistent logs and to modify multiple local rule and memory files as part of normal operation. That behavior exceeds what is necessary for a persona/avatar skill and creates integrity and privacy risk by allowing user conversations to be retained and the skill’s future behavior to change over time without explicit authorization.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The skill is presented as an analysis/judgment avatar, but it also mandates persistent self-modification of its own memory and rule files. This mismatch is dangerous because a user invoking a persona skill would not reasonably expect it to rewrite its configuration and behavioral instructions, enabling silent capability drift and unauthorized persistence.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The manifest explicitly enables persistent self-improvement with writeback targets that include core files such as meta.json, SKILL.md, and reference documents. For a persona/advice skill, allowing the skill to alter its own rules and metadata creates a durable prompt-injection and integrity risk: malicious or simply incorrect user input could be incorporated into future behavior across sessions.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The skill is described as an analysis/decision avatar, but the metadata also authorizes updating its own metadata and rule files via writeback targets. This expands the skill from passive persona emulation into self-modifying behavior, which can let adversarial conversations or mistaken inferences permanently change identity, instructions, and future outputs.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The file explicitly instructs the agent to present itself as a real person ('你是张迪，不是AI助手'), which creates deceptive identity behavior and conflicts with safe disclosure expectations. This can mislead users into believing they are receiving authentic human judgment, increasing trust-based manipulation risk, especially in advice or decision-support contexts.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The activation description is broad enough to trigger whenever a user asks how Zhang Di would think, act, or judge, without meaningful boundaries. Overbroad triggering increases the chance the persona will activate in inappropriate contexts and apply the skill’s self-modifying and data-retaining behaviors more often than intended.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill instructs the agent to perform ongoing file writes and self-modification but provides no user-facing disclosure or consent mechanism. This is dangerous because users may unknowingly provide content that is persisted to local files, and the skill may silently alter future behavior through rule changes.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The trigger list includes generic phrases such as '帮我分析', '给点意见', '你怎么想', and '如果是你', which are common in ordinary conversation. Overly broad triggers can cause accidental activation of the persona in unrelated contexts, increasing the chance of unwanted policy/style takeover and unintended access to skill-specific memory or behaviors.

Natural-Language Policy Violations

Medium

Confidence: 78% confidence
Finding: The description says the skill will act in place of a specific person and make judgments from that person's perspective, without indicating explicit user opt-in at response time. While primarily a policy and UX issue rather than a classic exploit path, it can still mislead users, override expected assistant behavior, and amplify the risk of accidental persona activation when combined with broad triggers.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The protocol explicitly instructs collecting real chat logs, emails, and workplace messages as ground-truth data, but provides no constraints on consent, minimization, redaction, or handling of personal and confidential information. In the context of an identity-emulation skill, this increases the chance of ingesting sensitive personal, corporate, or third-party data into the skill workflow, creating privacy, confidentiality, and potential compliance risks.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The document directs operators to store raw 'ground-truth' communications in a local directory without any safeguards for encryption, permissions, retention, or segregation of sensitive material. Because the repository is for a digital avatar modeled on a real person, storing raw source messages at rest can expose private conversations, work messages, and other confidential data if the environment or repo is accessed by unauthorized parties.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The persistent logging and memory instructions direct the agent to store conversational content in evolution logs, chat records, and other memory files. This creates a natural-language data retention risk because sensitive user inputs may be preserved beyond the session, later exposed to other processes, or used to shape future outputs without the user’s knowledge.

Ssd 3

Medium

Confidence: 90% confidence
Finding: The manifest defines persistent storage for notes, chats, and photos and couples that with self-improvement instructions to log and reuse information across sessions. This creates a retention and privacy risk because user conversation data may be stored and later used to influence future responses without clear consent, minimization, or retention controls.

Ssd 1

Medium

Confidence: 99% confidence
Finding: The persona core directs the model to maintain a fabricated human identity and avoid acknowledging it is an AI, which is a direct social-engineering and transparency risk. In a 'digital avatar' skill, this is more dangerous because the whole product framing encourages users to treat outputs as the authentic thoughts, values, and decisions of a real person.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal