Gigo Lobster Doctor

Security checks across malware telemetry and agentic risk

Overview

The skill’s default doctor entrypoint is mostly diagnostic, but it silently loads local secrets, bootstraps packages, and ships a broader benchmark runner with shell execution and upload paths that are not well scoped in the doctor-facing description.

Install only if you are comfortable with a doctor tool that contacts the GIGO API, checks a local OpenClaw gateway, may create a managed Python runtime and install packages, and may read OpenClaw secrets.env into its environment. Treat the packaged full benchmark runner as broader than the doctor label and avoid invoking main.py or setting GIGO_V2_AGENT_COMMAND unless you intentionally want that behavior.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (76)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
break
        # 执行
        try:
            proc = subprocess.run(
                cmd, shell=True, cwd=str(self.workdir),
                capture_output=True, timeout=timeout, text=True,
            )
Confidence
97% confidence
Finding
proc = subprocess.run( cmd, shell=True, cwd=str(self.workdir), capture_output=True, timeout=timeout, text=True, )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
runner_path = workdir / "_cov_runner.py"
    runner_path.write_text(runner)
    try:
        proc = subprocess.run(
            [sys.executable, str(runner_path)],
            cwd=str(workdir), capture_output=True, timeout=40, text=True,
        )
Confidence
94% confidence
Finding
proc = subprocess.run( [sys.executable, str(runner_path)], cwd=str(workdir), capture_output=True, timeout=40, text=True, )

eval() call detected

High
Category
Dangerous Code Execution
Content
print("Type a Python expression:")
expr = input("> ")
result = eval(expr)
print("Result:", result)
Confidence
99% confidence
Finding
result = eval(expr)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"-r",
        str(status.requirements_path),
    ]
    completed = subprocess.run(
        command,
        capture_output=True,
        text=True,
Confidence
95% confidence
Finding
completed = subprocess.run( command, capture_output=True, text=True, env={**os.environ, "PIP_USER": "0", "PYTHONNOUSERSITE": "1"}, check=False, )

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
profile_argv = None
    effective_argv = profile_argv if isinstance(profile_argv, list) else sys.argv[1:]
    argv = [str(runtime_python), str(skill_root / "main.py"), *[str(item) for item in effective_argv]]
    os.execve(str(runtime_python), argv, env)


def ensure_runtime(skill_root: Path, lang: str = "zh") -> RuntimeStatus:
Confidence
96% confidence
Finding
os.execve(str(runtime_python), argv, env)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
started = time.time()
        try:
            completed = subprocess.run(
                command,
                shell=True,
                cwd=str(workdir),
Confidence
98% confidence
Finding
completed = subprocess.run( command, shell=True, cwd=str(workdir), env=env, capture_output=True,

Tainted flow: 'expr' from input (line 4, user input) → eval (code execution)

Critical
Category
Data Flow
Content
print("Type a Python expression:")
expr = input("> ")
result = eval(expr)
print("Result:", result)
Confidence
100% confidence
Finding
result = eval(expr)

Tainted flow: 'command' from os.environ.get (line 280, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
started = time.time()
        try:
            completed = subprocess.run(
                command,
                shell=True,
                cwd=str(workdir),
Confidence
99% confidence
Finding
completed = subprocess.run( command, shell=True, cwd=str(workdir), env=env, capture_output=True,

Lp3

Medium
Category
MCP Least Privilege
Confidence
87% confidence
Finding
The skill advertises and instructs use of capabilities including shell execution, environment access, file I/O, and likely networked operations, but does not declare permissions. That undermines policy enforcement and informed consent, because a caller or platform cannot accurately assess the skill's operational scope before execution.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
This is a significant description-behavior mismatch: the skill claims to perform only environment checks, yet the broader behavior includes benchmark execution, bundle fetching/execution, cloud API interactions, uploads, and report generation. Such misrepresentation can cause users or orchestrators to approve a 'safe diagnostic' flow that actually performs materially riskier actions, including remote code/data handling and external communication.

Description-Behavior Mismatch

High
Confidence
99% confidence
Finding
The manifest materially contradicts the declared skill purpose. A skill presented as a limited environment diagnostic actually packages a broad 50-task benchmark with code editing, shell execution, networked installation, safety-trap tasks, and LLM-judged content generation, which expands operational scope far beyond user expectations and undermines trust and consent boundaries.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The file embeds capabilities unrelated to environment health checks, including arbitrary source modification, shell-oriented tasks, and subjective LLM-judged writing workflows. In the context of a supposedly narrow diagnostic skill, this is dangerous because it creates a covert capability expansion path that could cause unintended code changes, command execution, or data handling under misleading pretenses.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
At least one task explicitly requires network access for npm installation despite the skill description claiming it only performs environment checks and does not run formal tasting. This mismatch is dangerous because network activity increases supply-chain and data-exfiltration risk and violates the principle of least privilege for a diagnostic-only skill.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The file specifies a full cloud judging workflow that sends task content and agent output to a remote provider-driven evaluation service, which conflicts with the skill's declared 'doctor' scope of only performing environment checks and explicitly not doing formal tasting. This kind of scope expansion is dangerous because users or orchestrators may invoke the skill expecting a local diagnostic-only mode, while it actually enables remote processing and data egress to third-party model providers.

Description-Behavior Mismatch

High
Confidence
95% confidence
Finding
The task directs the agent to initialize a new npm project, install external dependencies, create code, and execute it, which materially exceeds the declared skill scope of environment-only diagnostics. This scope drift is dangerous because it enables filesystem changes, network package retrieval, and code execution under a diagnostic-branded skill, increasing the chance of unintended or misleadingly authorized actions.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
The prompt explicitly authorizes `npm install chalk` and `node index.js`, which introduce network access, supply-chain exposure, and execution of newly created code despite the skill being positioned as a non-invasive environment doctor. In this context, the mismatch makes the behavior more dangerous because users may trust the skill to perform only diagnostics while it actually modifies the system and runs code.

Description-Behavior Mismatch

High
Confidence
96% confidence
Finding
The task explicitly instructs the agent to initialize a new npm project, install a dependency from the network, create executable code, and run it. That behavior materially exceeds the stated skill scope of a limited environment diagnostic that should only inspect gateway, Python dependencies, package-chain linkage, and PNG certificate capability without performing a formal trial run, so it creates an integrity and scope-bypass risk.

Context-Inappropriate Capability

High
Confidence
94% confidence
Finding
Installing `chalk` and executing `node index.js` introduces unnecessary code-fetching and code-execution capabilities unrelated to the declared lobster-environment health check. In this skill context, those actions expand attack surface, permit unneeded network and filesystem side effects, and can be used to smuggle arbitrary package installation and runtime behavior under the guise of diagnostics.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The task explicitly instructs the agent to create and write `marker.txt`, which is an action outside the declared scope of an environment-check-only skill. This is dangerous because it normalizes arbitrary filesystem modification and demonstrates the agent can be induced to perform side effects unrelated to the advertised purpose, undermining user trust and boundary enforcement.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The prompt includes a resilience exercise that intentionally causes the first Bash call to fail and instructs the agent to try alternate methods until it succeeds in creating a file. This is risky because it trains or incentivizes persistence around tool failure for an unrelated side effect, which can become a mechanism for bypassing operational safeguards or expanding behavior beyond benign diagnostics.

Description-Behavior Mismatch

High
Confidence
96% confidence
Finding
The file's behavior does not match the stated skill purpose of environment-only diagnostics; instead it exposes arbitrary Python evaluation. That scope mismatch increases risk because users and reviewers may invoke the skill expecting harmless checks, while it actually provides code execution capability.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
Arbitrary code execution is unjustified in an environment doctor skill and materially increases the attack surface without operational need. In this context, the capability is more dangerous because it is hidden behind a benign diagnostic description, which can lower user suspicion and increase chance of misuse.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The helper enumerates multiple external locations for a `secrets.env` file and silently imports its contents into the process environment before invoking the skill runtime. For a skill explicitly described as a limited environment-check tool, this expands privilege and data access beyond what is necessary, making any downstream code able to read credentials that the user did not explicitly provide for this run.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The skill metadata and description present this as a doctor-only environment check, but the entrypoint clearly supports full benchmark execution, task fetching, agent runs, scoring, report/certificate generation, session management, and optional cloud upload when --doctor is not used. This mismatch is dangerous because users or higher-level orchestrators may invoke the skill expecting a safe diagnostic-only action, while the code can perform broader networked and stateful operations with side effects.

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The CLI advertises 'Lobster Taster local benchmark,' which conflicts with the declared doctor-only skill purpose. In an agent-skill context, inconsistent descriptions increase the risk of deceptive or accidental misuse because safety decisions may rely on metadata and user-facing descriptions rather than auditing the full code path.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal