Security audit

Code Factory

Security checks across malware telemetry and agentic risk

Overview

This skill is a coherent local code-generation tool, but it can automatically write files and run tests with shell execution on broad coding requests, so users should review it before installing.

Install only if you want a project generator that can write files, create runnable scripts, and execute pytest locally. Use it in a sandbox or disposable workspace, review generated requirements and run.sh before running them, and avoid using it on sensitive code unless you are comfortable with local diagnostic logs under .learnings.

SkillSpector

By NVIDIA

Vulnerability Patterns

Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (12)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: return {"passed": True, "output": "无测试目录", "summary": "跳过"} try: result = subprocess.run( [sys.executable, "-m", "pytest", str(test_dir), "-v", "--tb=short"], capture_output=True, text=True,
Confidence: 92% confidence
Finding: result = subprocess.run( [sys.executable, "-m", "pytest", str(test_dir), "-v", "--tb=short"], capture_output=True, text=True,

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: The skill description materially understates behavior: beyond generating files, it performs environment inspection, executes `pytest`, auto-modifies code on failure, and persists learning artifacts for future runs. This mismatch can cause users or orchestrators to invoke the skill without realizing it will execute commands and create additional files, increasing the chance of unintended side effects.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The engine unconditionally adds multiple operational and metadata files such as run.sh, SKILL.md, manifest.json, and environment.toml regardless of the user's requested deliverables. In an agentic code-generation skill, this expands the produced artifact set beyond user intent, which can introduce unexpected executable surfaces, hidden behavior, or packaging metadata that downstream automation may trust or execute.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The docstring promises atomic commit and complete rollback, but commit() mutates target files directly and does not handle partial failures. If an unlink or copy operation fails midway, the target directory can be left in a mixed state with some files deleted or updated and others unchanged, violating integrity guarantees that callers may rely on for safe project generation.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The template unconditionally grants the generated skill the `exec` capability, which allows arbitrary command execution by any downstream agent using the produced metadata. For a metadata generator whose stated purpose is creating standard project scaffolding, this is broader privilege than necessary and increases the blast radius if generated code, prompts, or dependencies are malicious or compromised.

Vague Triggers

High

Confidence: 95% confidence
Finding: The trigger phrases are broad enough to match ordinary coding requests such as 'write me a tool' or 'create a project,' which can auto-activate a skill that writes files and runs shell commands. In context, this is more dangerous because the skill includes `exec`-based verification and retry loops, so accidental activation can lead to unanticipated local command execution and filesystem changes.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill instructs running `pip install -e ".[dev]"` and `pytest tests/ -v` but does not prominently warn users that verification entails shell execution and potentially dependency installation. Because command execution is one of the highest-risk capabilities in agent skills, omitting that warning undermines informed consent and can expose the host environment to unintended changes.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The controller writes retry failure patterns to `.learnings/` on disk without any consent, notice, retention control, or sanitization. Because the logged JSON includes project name, step history, and verification details, this can silently persist potentially sensitive user or code metadata beyond the execution lifecycle.

Missing User Warnings

Low

Confidence: 84% confidence
Finding: When `guard` is absent, verification runs directly without timeout enforcement, which can allow tests or verification steps to hang indefinitely or consume resources unexpectedly. In an automated project-generation pipeline, that creates a denial-of-service and reliability risk even if it does not directly expose confidentiality or integrity.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The failure log stores plain JSON containing project name, retry history, current step, and full verification report, which may include sensitive code, file paths, test output, or user-derived content. Persisting this data in a predictable `.learnings/` directory increases the chance of unintended disclosure, later collection, or cross-run leakage.

Unbounded Resource Access

Medium

Category: Excessive Agency
Content: return self.breaker.execute_with_timeout( verify_fn, timeout_seconds=timeout_seconds, on_timeout=None, ) except Exception: return None
Confidence: 84% confidence
Finding: timeout=None

Known Vulnerable Dependency: pytest — 1 advisory(ies): CVE-2025-71176 (pytest has vulnerable tmpdir handling)

Low

Category: Supply Chain
Confidence: 84% confidence
Finding: pytest

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Static analysis

No suspicious patterns detected.