Security audit

Skill2team

Security checks across malware telemetry and agentic risk

Overview

Skill2Team is a disclosed agent-team design and Codex packaging skill whose configuration changes are expected for its purpose and guarded by manifests and smoke-test requirements.

Install only if you want a skill that can generate and help register Codex multi-agent packages. Before using package-end registration prompts or helper scripts, inspect the generated manifest and .codex/config.toml changes, avoid bundling secrets or private local files, and run smoke tests before trusting a generated team as runnable.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (11)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill advertises operational capabilities that imply file reading, file writing, and shell-script execution, but it does not declare permissions or constraints for those actions. This creates a transparency and policy-enforcement gap: users or hosting platforms may treat it as a low-risk design-only skill while it can actually modify local Codex configuration, package files, and invoke helper scripts.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The stated purpose presents the skill as a team-design tool, but the body authorizes materially stronger behavior including generating deployment artifacts, modifying `.codex` configuration/manifests, registering agents, bundling local resources, and running validation scripts. That mismatch is dangerous because users may consent to architecture assistance without realizing the skill can perform system-affecting actions and package potentially sensitive local content into distributable artifacts.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The description claims broad research, fact-verification, report-writing, quality-checking, and iterative revision capabilities that exceed the stated purpose of designing profile-based agent teams. This mismatch can cause the skill to be invoked in unintended contexts and may lead downstream systems or users to over-trust the skill with tasks outside its intended scope, increasing the risk of unsafe delegation or prompt routing errors.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The contract explicitly requires design outputs to include prompts for API-service runner construction, API-run role simulation, and framework conversions (Hermes/OpenClaw), which expands the skill from team design/package into downstream execution and cross-runtime enablement. In a packaging skill, embedding operational follow-up guidance increases the chance that generated artifacts are reused to stand up execution paths outside the declared runtime boundary, weakening scope control and policy consistency.

Description-Behavior Mismatch

Medium

Confidence: 84% confidence
Finding: The final reply rule mandates non-Codex follow-up prompts even though the skill states it does not expose delivery modes beyond design/package and the package contract says package-end prompts are Codex-only. This inconsistency can mislead downstream agents or users into treating cross-runtime continuation paths as supported outputs, enabling unauthorized or poorly controlled deployment patterns.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The helper edits the live Codex project configuration by forcing feature flags, adjusting agent execution settings, and inserting agent config entries into .codex/config.toml. That is a real integrity-affecting behavior beyond a design-only skill description, and if a user runs it in an unexpected workspace it can silently alter runtime behavior for other agents and workflows.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The unregister path removes installed TOML agent artifacts and registry files from the Codex workspace, which is destructive behavior. Although the code includes safeguards such as root checks and ownership markers, a user invoking replacement or unregister operations can still lose local agent artifacts or configuration state, especially when manifests are stale or the helper is run in the wrong project root.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The prompt instructs the agent to copy and merge agent definitions and configuration files directly into the current project, which modifies local project state and trust boundaries without any requirement for user confirmation, preview, or rollback. In this skill context, those files can influence agent behavior and runtime configuration, so silent merging could introduce unreviewed capabilities or alter existing automation in ways the user did not explicitly approve.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The prompt directs the system to start the registered entry agent with the user's source task, which initiates automated behavior without an explicit warning, approval checkpoint, or clear execution boundary. Because this skill is specifically for assembling and launching profile-based agent teams, auto-starting an entry agent can trigger downstream fanout, tool use, and delegated actions, making the operational risk higher than a generic prompt instruction.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The description uses broad, generic language such as searching, verifying, analyzing, writing, and supporting revisions without clear trigger constraints or activation boundaries. In an agent ecosystem, this ambiguity can cause over-broad matching and accidental invocation for many unrelated tasks, which increases the likelihood of misuse, privilege overreach, or unsafe composition with other tools and skills.

Natural-Language Policy Violations

Medium

Confidence: 94% confidence
Finding: The skill hard-codes a vendor-specific default by stating that OpenAI Codex is the default runner and requiring `codex` for deployable artifacts. This can steer users into a specific platform without explicit opt-in, reduce portability, and cause downstream workflows or packaging decisions to assume one provider even when the user's security, compliance, or cost requirements differ.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dynamic_code_execution

Dynamic code execution detected.

Critical

Code: suspicious.dynamic_code_execution
Location: scripts/ensure_codex_meta_team.py:75