Back to skill

Security audit

persona-model-trainer

Security checks across malware telemetry and agentic risk

Overview

This is a real persona model training skill, but it needs review because arbitrary Hugging Face model choices can execute repository code locally and the workflow handles highly sensitive persona data.

Install only if you are comfortable running local ML training code on sensitive persona data. Use trusted and pinned model repositories, or remove trust_remote_code before running; review and redact training files before preparation; keep generated adapters private; and use Colab or Hugging Face push only when you intentionally want persona material or derived model artifacts to leave your machine.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (11)

Description-Behavior Mismatch

Medium
Confidence
99% confidence
Finding
The script loads HuggingFace models with trust_remote_code=True, which permits execution of Python code supplied by the remote model repository during evaluation. That means a user-provided or compromised base-model repo can run arbitrary code on the local machine, contradicting the skill's 'locally runnable' framing and materially expanding the trust boundary.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
This evaluation path enables untrusted remote repository code execution through HuggingFace model loading. In the context of a tool that fine-tunes and evaluates arbitrary persona models, users may reasonably supply third-party model IDs, making code execution from a malicious model repo a realistic attack path rather than a theoretical concern.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
`trust_remote_code=True` causes Transformers to execute model-repository Python code during model loading. If the supplied base model points to a malicious or compromised Hugging Face repository, exporting the model can trigger arbitrary code execution on the analyst's machine or build host.

Context-Inappropriate Capability

High
Confidence
95% confidence
Finding
This code can publish archived training conversation data to a remote repository, which creates a real confidentiality and privacy risk if the dataset contains personal or sensitive persona conversations. Although the push is gated by `--include-data` and an interactive confirmation, the feature still enables easy external disclosure of local training data in a skill advertised primarily around local fine-tuning.

Context-Inappropriate Capability

Medium
Confidence
98% confidence
Finding
The script loads a HuggingFace base model with `trust_remote_code=True`, which allows arbitrary Python from the model repository to execute during `from_pretrained()`. In a local utility that accepts a user-supplied `--base-model` identifier, this creates a real code-execution path if the repo is malicious or compromised, and there is no strong functional need shown here that justifies enabling remote code trust.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill explicitly instructs the agent to create root-level files (`train.py`, `prepare.py`) in the working directory without any requirement to obtain user confirmation or clearly warn that repository files will be created/overwritten. In an agent setting, this is dangerous because it can silently modify project state, clobber existing entry points, and create executable artifacts that later get run as trusted workflow components.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
This section directs creation of executable wrappers and then invokes an automated code-modification loop via the `autoresearch` skill, again without a clear safety boundary or user warning. That combination is riskier than simple file creation because it enables repeated autonomous edits to training code and repeated execution, which can damage repositories, introduce unsafe changes, or run attacker-influenced logic from other skills.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The documented trigger phrase is broad enough that ordinary user requests about training a model for a person could unintentionally activate data-collection behavior in anyone-skill. In a multi-skill agent environment, this can cause unintended workflow execution, collection of sensitive personal data, or actions the user did not explicitly consent to, especially because this pipeline is designed to build persona datasets from source material.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The guide instructs users to push adapter weights and optionally training data to HuggingFace Hub without prominently warning that both may contain personal, copyrighted, or otherwise sensitive information. In this skill’s context, the model and training snapshot are derived from persona source material, so publishing them can leak private data directly or via memorized attributes embedded in the adapter.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
Beyond the remote-code execution risk itself, the script gives only normal status messages when loading the model and does not clearly disclose that arbitrary repository code may run. This increases the chance that users will unknowingly execute untrusted code while believing they are only performing local inference.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The script provides only normal progress messages while performing a model load that may execute remote repository code because `trust_remote_code=True` is enabled. This increases the likelihood of unsafe use because operators may assume they are only downloading weights, not authorizing arbitrary code execution from the selected model repository.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal