Security audit

Embodied-OS - AI Robot Control System

Security checks across malware telemetry and agentic risk

Overview

This robot-control skill is not clearly malicious, but it grants high-impact physical robot authority and bundles unrelated video-generation instructions with extra installs and API use.

Review before installing. Use this only in simulation or a supervised, bounded robot workspace until you have verified the installed packages, hardware limits, emergency stop behavior, and provider data flows. Remove or ignore the bundled old video-generator files, use dedicated API keys with spend limits, and require explicit confirmation before any real robot movement, long-running monitoring, repository install, or external API submission.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (10)

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The file's behavior is materially unrelated to the parent skill metadata for embodied robot control, instead describing a general-purpose video generation workflow. This kind of scope mismatch is dangerous because it can cause an agent or reviewer to trust and invoke capabilities outside the declared domain, increasing the chance of unexpected code execution, repository cloning, and API use under a misleading label.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The skill instructs the agent to clone an external repository, install dependencies, set credentials, and run local shell/Node commands. In a skill ecosystem, these actions create a supply-chain and arbitrary code execution risk, especially when they are not tightly justified by the surrounding embodied-robot-control context and could be triggered by ordinary user requests.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The auto-trigger keywords are broad enough to match many ordinary conversations about videos, causing the skill to activate in situations where the user may not have intended repository access, dependency installation, or script execution. Overbroad activation increases the risk that unsafe actions occur with weak user intent validation.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The condition that text merely 'looks like a video script' is ambiguous and subjective, which can cause false activations. In this skill, false activation is more dangerous because the documented workflow escalates quickly from interpretation to shell commands and external code execution.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The auto-trigger criteria are broad enough to activate on normal discussion about videos, scripts, or related tooling, which can cause the agent to invoke this skill when the user did not clearly request execution. In this skill, activation can lead to cloning repos, installing packages, and sending user-provided script content to external APIs, so accidental triggering creates meaningful privacy and safety risk.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill directs use of OpenAI TTS and Whisper and instructs users to configure an API key, but it does not clearly warn at the point of use that user scripts and audio-derived content may be transmitted to third-party services. Because this skill is designed to process arbitrary user-provided content, missing disclosure and consent can expose sensitive or proprietary data to external providers.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The skill defines very broad automatic trigger conditions such as generic video-generation-related queries and any text that merely looks like a script. This can cause unintended invocation of the skill in normal conversation, leading the agent to run local shell commands, access external services, and incur API usage or side effects without sufficiently explicit user intent.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The examples state that common phrases like '做个视频' and '我想要一个关于AI的短视频' must use this skill, even though such requests may be underspecified and may not include enough detail for safe execution. This increases the risk of the agent prematurely invoking an operational pipeline, running shell commands, and making network/API calls based on vague prompts.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The activation criteria are overly broad for a skill that can control physical robots, so the skill may be invoked during general robotics discussions rather than only when the user explicitly intends robot actuation. In this context, unintended invocation is more dangerous than a typical software skill because it can route natural-language requests into planning or action pathways that affect real-world systems.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill advertises natural-language control of physical robots without prominent warnings about motion hazards, property damage, or injury risk. Because the skill targets embodied systems and presents high-level autonomous actions, omission of explicit safety warnings and operational limits materially increases the chance of unsafe use or overtrust by users.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal