Back to skill

Security audit

melo-tts-metadata-creator

Security checks across malware telemetry and agentic risk

Overview

The skill appears to do the advertised MeloTTS metadata work, but it also makes broad Python environment changes and downloads packages at runtime without clear enough user control.

Review before installing. Use it only in an isolated environment where changing Python packages, creating a shared virtualenv, downloading large ML packages/models, reading local audio/text directories, and writing transcript/output files are acceptable. There is no artifact-backed evidence of credential theft, exfiltration, or destructive intent, but the environment mutation and persistence are significant enough to warrant the Review bucket.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (25)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"""专门为老项目(使用 pkg_resources 的 setup.py)修复 setuptools 版本"""
    logger.info("🔧 正在修复 setuptools 版本(兼容旧 GitHub 包构建)...")
    try:
        subprocess.check_call([
            sys.executable, "-m", "pip", "install",
            "--quiet", "--force-reinstall", "setuptools<=81.2.0", "wheel"
        ])
Confidence
94% confidence
Finding
subprocess.check_call([ sys.executable, "-m", "pip", "install", "--quiet", "--force-reinstall", "setuptools<=81.2.0", "wheel" ])

Dynamic import via __import__()

Medium
Category
Dangerous Code Execution
Content
# 第一步:尝试 import 检查(最快)
    try:
        parts = import_name.split('.')
        mod = __import__(parts[0])
        for part in parts[1:]:
            mod = getattr(mod, part)
        if sub_import:
Confidence
76% confidence
Finding
mod = __import__(parts[0])

subprocess module call

Medium
Category
Dangerous Code Execution
Content
cmd.extend(["-i", "https://pypi.tuna.tsinghua.edu.cn/simple"])

    try:
        subprocess.check_call(cmd)
        logger.info(f"✅ {spec} 安装/升级完成!")
        
    except subprocess.CalledProcessError as e:
Confidence
98% confidence
Finding
subprocess.check_call(cmd)

Dynamic import via __import__()

Medium
Category
Dangerous Code Execution
Content
# ==================== 1. 检查是否已安装 + 版本是否满足 ====================
    try:
        __import__(import_name)
        
        # 尝试获取当前版本
        try:
Confidence
72% confidence
Finding
__import__(import_name)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
logger.warning(f"🔧 正在安装 {install_str} ...")

    try:
        subprocess.check_call([
            sys.executable, "-m", "pip", "install",
            install_str,
            "-i", "https://pypi.tuna.tsinghua.edu.cn/simple",
Confidence
95% confidence
Finding
subprocess.check_call([ sys.executable, "-m", "pip", "install", install_str, "-i", "https://pypi.tuna.tsinghua.edu.cn/simple", "--quiet"

subprocess module call

Medium
Category
Dangerous Code Execution
Content
sys.executable, "-m", "pip", "install",
                        "--upgrade", fallback_zip, "--quiet"
                    ]
                    subprocess.check_call(cmd_fallback)
                    logger.info(f"✅ 使用本地包 {fallback_zip} 安装成功!")
                    return
                except subprocess.CalledProcessError as e2:
Confidence
93% confidence
Finding
subprocess.check_call(cmd_fallback)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
logger.info("虚拟环境创建成功")

        logger.info("正在升级 pip...")
        subprocess.check_call([str(venv_python), "-m", "pip", "install", "--upgrade", "pip"])

    # ==================== 检查 PyTorch GPU 是否已安装 ====================
    if Path(venv_python).exists() and is_torch_gpu_installed(venv_python):
Confidence
92% confidence
Finding
subprocess.check_call([str(venv_python), "-m", "pip", "install", "--upgrade", "pip"])

subprocess module call

Medium
Category
Dangerous Code Execution
Content
# 安装 PyTorch
        logger.info("正在安装 PyTorch(~2-3GB,请耐心等待)...")
        subprocess.check_call([
            str(venv_python), "-m", "pip", "install", "torch", "torchvision", "torchaudio",
            "--index-url", index_url
        ])
Confidence
97% confidence
Finding
subprocess.check_call([ str(venv_python), "-m", "pip", "install", "torch", "torchvision", "torchaudio", "--index-url", index_url ])

subprocess module call

Medium
Category
Dangerous Code Execution
Content
logger.info("安装 audio-separator CPU 版 + librosa...")
            subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[cpu]", "librosa"])

        subprocess.check_call([str(venv_python), "-m", "pip", "install", "pydub"])
        subprocess.check_call([str(venv_python), "-m", "pip", "install", "huggingface-hub[tqdm]"])
        
        logger.info("✅ 虚拟环境及所有依赖安装完成!")
Confidence
90% confidence
Finding
subprocess.check_call([str(venv_python), "-m", "pip", "install", "pydub"])

subprocess module call

Medium
Category
Dangerous Code Execution
Content
subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[cpu]", "librosa"])

        subprocess.check_call([str(venv_python), "-m", "pip", "install", "pydub"])
        subprocess.check_call([str(venv_python), "-m", "pip", "install", "huggingface-hub[tqdm]"])
        
        logger.info("✅ 虚拟环境及所有依赖安装完成!")
Confidence
90% confidence
Finding
subprocess.check_call([str(venv_python), "-m", "pip", "install", "huggingface-hub[tqdm]"])

subprocess module call

Medium
Category
Dangerous Code Execution
Content
# 安装 audio-separator + librosa(你提到的)
        if use_gpu:
            logger.info("安装 audio-separator GPU 版 + librosa...")
            subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[gpu]", "librosa"])
        else:
            logger.info("安装 audio-separator CPU 版 + librosa...")
            subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[cpu]", "librosa"])
Confidence
92% confidence
Finding
subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[gpu]", "librosa"])

subprocess module call

Medium
Category
Dangerous Code Execution
Content
subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[gpu]", "librosa"])
        else:
            logger.info("安装 audio-separator CPU 版 + librosa...")
            subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[cpu]", "librosa"])

        subprocess.check_call([str(venv_python), "-m", "pip", "install", "pydub"])
        subprocess.check_call([str(venv_python), "-m", "pip", "install", "huggingface-hub[tqdm]"])
Confidence
91% confidence
Finding
subprocess.check_call([str(venv_python), "-m", "pip", "install", "audio-separator[cpu]", "librosa"])

Lp3

Medium
Category
MCP Least Privilege
Confidence
95% confidence
Finding
The skill declares no permissions while its documented behavior requires shell execution and filesystem read/write access. This creates a trust and review gap: callers may invoke a skill believing it is low-risk when it can execute commands and modify local files.

Tp4

High
Category
MCP Tool Poisoning
Confidence
98% confidence
Finding
The declared purpose is narrow metadata generation, but the described behavior includes environment creation, hardware probing, package installation, dependency downgrades, network downloads, and local model management. This materially expands the attack surface and system impact beyond user expectations, enabling unintended code execution paths and potentially unsafe supply-chain activity.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The file's actual behavior is a general-purpose package installer, not MeloTTS metadata generation. This mismatch is dangerous because it hides powerful environment-modifying and code-fetching behavior behind an unrelated skill description, reducing operator scrutiny and increasing the chance of abuse.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
The code supports installing arbitrary packages from PyPI, git URLs, local zip files, and wheel files, which effectively grants a generic code acquisition and execution mechanism. In the context of a metadata-file generator, this capability is unjustified and dramatically more dangerous because package installation can execute attacker-controlled code during build or import.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The module changes the Python environment as soon as it is imported by forcibly reinstalling setuptools and wheel. Import-time side effects are especially risky because they occur implicitly, may trigger network access and package execution without user intent, and can destabilize the host environment.

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The header claims the skill is part of melo-tts-metadata-creator but documents an intelligent package installation tool. This capability misrepresentation increases security risk because reviewers and users may trust the file as benign dataset tooling while it actually performs dependency installation and environment mutation.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The file's behavior is materially broader than the skill manifest: it performs environment bootstrapping, hardware discovery, and package management rather than just metadata generation or optional transcription. Capability overreach is dangerous because it conditions users to grant more system access than the advertised task requires, enabling unexpected code execution and host modification.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
This section performs subprocess execution, GPU probing, and multiple package installations that are not justified by a metadata-file generator. In-context, that mismatch makes the behavior more dangerous because users invoking a simple dataset utility would not reasonably expect network downloads and system-level environment changes.

Intent-Code Divergence

Medium
Confidence
87% confidence
Finding
The file is presented as part of the metadata creator skill, but its own description centers on a bootstrapper that enforces Python versions, detects GPUs, and installs dependencies. This discrepancy undermines transparency and makes it harder for users and reviewers to understand the actual trust and execution model.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The script installs Python packages at runtime via ensure_package.pip before doing its core work. In a skill context, this creates a supply-chain and code-execution surface because package resolution/download occurs dynamically and may execute installer code or fetch unpinned dependencies from external sources, which is broader capability than a simple metadata generator should need.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The trigger examples are broad, natural-language phrases like '帮我生成 MeloTTS 的 metadata.list' and '用 Whisper 为这些音频转录并生成 MeloTTS 训练文件', which can match ordinary user requests and cause the skill to activate unexpectedly. Because the skill can recurse through directories, generate output files, and optionally download models/transcribe audio, accidental invocation can lead to unintended filesystem actions and processing of sensitive local data.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The README advertises automatic Whisper transcription, model download into ./models/, skipping failures, and metadata generation, but does not prominently warn users that local audio may be read recursively, new models may be fetched, and output files may be written or overwritten. In an agent context, insufficient disclosure increases the risk of unexpected processing of private voice data and unanticipated filesystem/network side effects.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
The skill advertises automatic transcription and local model download without warning users about privacy implications, bandwidth/storage use, or system changes. Audio content may be sensitive, and automatic downloads/installations can surprise users and affect host stability or policy compliance.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal