Gigo Lobster Doctor

Security checks across malware telemetry and agentic risk

Overview

The skill’s default doctor entrypoint is mostly diagnostic, but it silently loads local secrets, bootstraps packages, and ships a broader benchmark runner with shell execution and upload paths that are not well scoped in the doctor-facing description.

Install only if you are comfortable with a doctor tool that contacts the GIGO API, checks a local OpenClaw gateway, may create a managed Python runtime and install packages, and may read OpenClaw secrets.env into its environment. Treat the packaged full benchmark runner as broader than the doctor label and avoid invoking main.py or setting GIGO_V2_AGENT_COMMAND unless you intentionally want that behavior.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (76)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: break # 执行 try: proc = subprocess.run( cmd, shell=True, cwd=str(self.workdir), capture_output=True, timeout=timeout, text=True, )
Confidence: 97% confidence
Finding: proc = subprocess.run( cmd, shell=True, cwd=str(self.workdir), capture_output=True, timeout=timeout, text=True, )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: runner_path = workdir / "_cov_runner.py" runner_path.write_text(runner) try: proc = subprocess.run( [sys.executable, str(runner_path)], cwd=str(workdir), capture_output=True, timeout=40, text=True, )
Confidence: 94% confidence
Finding: proc = subprocess.run( [sys.executable, str(runner_path)], cwd=str(workdir), capture_output=True, timeout=40, text=True, )

eval() call detected

High

Category: Dangerous Code Execution
Content: print("Type a Python expression:") expr = input("> ") result = eval(expr) print("Result:", result)
Confidence: 99% confidence
Finding: result = eval(expr)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: "-r", str(status.requirements_path), ] completed = subprocess.run( command, capture_output=True, text=True,
Confidence: 95% confidence
Finding: completed = subprocess.run( command, capture_output=True, text=True, env={**os.environ, "PIP_USER": "0", "PYTHONNOUSERSITE": "1"}, check=False, )

os.system() or os exec-family call

High

Category: Dangerous Code Execution
Content: profile_argv = None effective_argv = profile_argv if isinstance(profile_argv, list) else sys.argv[1:] argv = [str(runtime_python), str(skill_root / "main.py"), *[str(item) for item in effective_argv]] os.execve(str(runtime_python), argv, env) def ensure_runtime(skill_root: Path, lang: str = "zh") -> RuntimeStatus:
Confidence: 96% confidence
Finding: os.execve(str(runtime_python), argv, env)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: started = time.time() try: completed = subprocess.run( command, shell=True, cwd=str(workdir),
Confidence: 98% confidence
Finding: completed = subprocess.run( command, shell=True, cwd=str(workdir), env=env, capture_output=True,

Tainted flow: 'expr' from input (line 4, user input) → eval (code execution)

Critical

Category: Data Flow
Content: print("Type a Python expression:") expr = input("> ") result = eval(expr) print("Result:", result)
Confidence: 100% confidence
Finding: result = eval(expr)

Tainted flow: 'command' from os.environ.get (line 280, credential/environment) → subprocess.run (code execution)

Medium

Category: Data Flow
Content: started = time.time() try: completed = subprocess.run( command, shell=True, cwd=str(workdir),
Confidence: 99% confidence
Finding: completed = subprocess.run( command, shell=True, cwd=str(workdir), env=env, capture_output=True,

Lp3

Medium

Category: MCP Least Privilege
Confidence: 87% confidence
Finding: The skill advertises and instructs use of capabilities including shell execution, environment access, file I/O, and likely networked operations, but does not declare permissions. That undermines policy enforcement and informed consent, because a caller or platform cannot accurately assess the skill's operational scope before execution.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: This is a significant description-behavior mismatch: the skill claims to perform only environment checks, yet the broader behavior includes benchmark execution, bundle fetching/execution, cloud API interactions, uploads, and report generation. Such misrepresentation can cause users or orchestrators to approve a 'safe diagnostic' flow that actually performs materially riskier actions, including remote code/data handling and external communication.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The manifest materially contradicts the declared skill purpose. A skill presented as a limited environment diagnostic actually packages a broad 50-task benchmark with code editing, shell execution, networked installation, safety-trap tasks, and LLM-judged content generation, which expands operational scope far beyond user expectations and undermines trust and consent boundaries.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The file embeds capabilities unrelated to environment health checks, including arbitrary source modification, shell-oriented tasks, and subjective LLM-judged writing workflows. In the context of a supposedly narrow diagnostic skill, this is dangerous because it creates a covert capability expansion path that could cause unintended code changes, command execution, or data handling under misleading pretenses.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: At least one task explicitly requires network access for npm installation despite the skill description claiming it only performs environment checks and does not run formal tasting. This mismatch is dangerous because network activity increases supply-chain and data-exfiltration risk and violates the principle of least privilege for a diagnostic-only skill.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The file specifies a full cloud judging workflow that sends task content and agent output to a remote provider-driven evaluation service, which conflicts with the skill's declared 'doctor' scope of only performing environment checks and explicitly not doing formal tasting. This kind of scope expansion is dangerous because users or orchestrators may invoke the skill expecting a local diagnostic-only mode, while it actually enables remote processing and data egress to third-party model providers.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The task directs the agent to initialize a new npm project, install external dependencies, create code, and execute it, which materially exceeds the declared skill scope of environment-only diagnostics. This scope drift is dangerous because it enables filesystem changes, network package retrieval, and code execution under a diagnostic-branded skill, increasing the chance of unintended or misleadingly authorized actions.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The prompt explicitly authorizes `npm install chalk` and `node index.js`, which introduce network access, supply-chain exposure, and execution of newly created code despite the skill being positioned as a non-invasive environment doctor. In this context, the mismatch makes the behavior more dangerous because users may trust the skill to perform only diagnostics while it actually modifies the system and runs code.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The task explicitly instructs the agent to initialize a new npm project, install a dependency from the network, create executable code, and run it. That behavior materially exceeds the stated skill scope of a limited environment diagnostic that should only inspect gateway, Python dependencies, package-chain linkage, and PNG certificate capability without performing a formal trial run, so it creates an integrity and scope-bypass risk.

Context-Inappropriate Capability

High

Confidence: 94% confidence
Finding: Installing `chalk` and executing `node index.js` introduces unnecessary code-fetching and code-execution capabilities unrelated to the declared lobster-environment health check. In this skill context, those actions expand attack surface, permit unneeded network and filesystem side effects, and can be used to smuggle arbitrary package installation and runtime behavior under the guise of diagnostics.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The task explicitly instructs the agent to create and write `marker.txt`, which is an action outside the declared scope of an environment-check-only skill. This is dangerous because it normalizes arbitrary filesystem modification and demonstrates the agent can be induced to perform side effects unrelated to the advertised purpose, undermining user trust and boundary enforcement.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The prompt includes a resilience exercise that intentionally causes the first Bash call to fail and instructs the agent to try alternate methods until it succeeds in creating a file. This is risky because it trains or incentivizes persistence around tool failure for an unrelated side effect, which can become a mechanism for bypassing operational safeguards or expanding behavior beyond benign diagnostics.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The file's behavior does not match the stated skill purpose of environment-only diagnostics; instead it exposes arbitrary Python evaluation. That scope mismatch increases risk because users and reviewers may invoke the skill expecting harmless checks, while it actually provides code execution capability.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: Arbitrary code execution is unjustified in an environment doctor skill and materially increases the attack surface without operational need. In this context, the capability is more dangerous because it is hidden behind a benign diagnostic description, which can lower user suspicion and increase chance of misuse.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The helper enumerates multiple external locations for a `secrets.env` file and silently imports its contents into the process environment before invoking the skill runtime. For a skill explicitly described as a limited environment-check tool, this expands privilege and data access beyond what is necessary, making any downstream code able to read credentials that the user did not explicitly provide for this run.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The skill metadata and description present this as a doctor-only environment check, but the entrypoint clearly supports full benchmark execution, task fetching, agent runs, scoring, report/certificate generation, session management, and optional cloud upload when --doctor is not used. This mismatch is dangerous because users or higher-level orchestrators may invoke the skill expecting a safe diagnostic-only action, while the code can perform broader networked and stateful operations with side effects.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The CLI advertises 'Lobster Taster local benchmark,' which conflicts with the declared doctor-only skill purpose. In an agent-skill context, inconsistent descriptions increase the risk of deceptive or accidental misuse because safety decisions may rely on metadata and user-facing descriptions rather than auditing the full code path.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal