Gigo Lobster Taster

Security checks across malware telemetry and agentic risk

Overview

This is a coherent benchmark skill, but it needs Review because it combines default cloud upload, local code execution, package installation, and automatic loading of workspace secrets with limited permission scoping.

Install only if you are comfortable running a benchmark that may execute local tests and package commands, create caches/workdirs, read OpenClaw-related profile or secrets files, and upload detailed task results to the GIGO API by default. Prefer a local/offline mode where available if you do not want cloud submission, and avoid placing unrelated secrets in workspace-level secrets.env files before running.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (67)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
break
        # 执行
        try:
            proc = subprocess.run(
                cmd, shell=True, cwd=str(self.workdir),
                capture_output=True, timeout=timeout, text=True,
            )
Confidence
96% confidence
Finding
proc = subprocess.run( cmd, shell=True, cwd=str(self.workdir), capture_output=True, timeout=timeout, text=True, )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
runner_path = workdir / "_cov_runner.py"
    runner_path.write_text(runner)
    try:
        proc = subprocess.run(
            [sys.executable, str(runner_path)],
            cwd=str(workdir), capture_output=True, timeout=40, text=True,
        )
Confidence
95% confidence
Finding
proc = subprocess.run( [sys.executable, str(runner_path)], cwd=str(workdir), capture_output=True, timeout=40, text=True, )

eval() call detected

High
Category
Dangerous Code Execution
Content
print("Type a Python expression:")
expr = input("> ")
result = eval(expr)
print("Result:", result)
Confidence
99% confidence
Finding
result = eval(expr)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"-r",
        str(status.requirements_path),
    ]
    completed = subprocess.run(
        command,
        capture_output=True,
        text=True,
Confidence
86% confidence
Finding
completed = subprocess.run( command, capture_output=True, text=True, env={**os.environ, "PIP_USER": "0", "PYTHONNOUSERSITE": "1"}, check=False, )

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
profile_argv = None
    effective_argv = profile_argv if isinstance(profile_argv, list) else sys.argv[1:]
    argv = [str(runtime_python), str(skill_root / "main.py"), *[str(item) for item in effective_argv]]
    os.execve(str(runtime_python), argv, env)


def ensure_runtime(skill_root: Path, lang: str = "zh") -> RuntimeStatus:
Confidence
88% confidence
Finding
os.execve(str(runtime_python), argv, env)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
started = time.time()
        try:
            completed = subprocess.run(
                command,
                shell=True,
                cwd=str(workdir),
Confidence
97% confidence
Finding
completed = subprocess.run( command, shell=True, cwd=str(workdir), env=env, capture_output=True,

Tainted flow: 'expr' from input (line 4, user input) → eval (code execution)

Critical
Category
Data Flow
Content
print("Type a Python expression:")
expr = input("> ")
result = eval(expr)
print("Result:", result)
Confidence
100% confidence
Finding
result = eval(expr)

Tainted flow: 'command' from os.environ.get (line 280, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
started = time.time()
        try:
            completed = subprocess.run(
                command,
                shell=True,
                cwd=str(workdir),
Confidence
99% confidence
Finding
completed = subprocess.run( command, shell=True, cwd=str(workdir), env=env, capture_output=True,

Lp3

Medium
Category
MCP Least Privilege
Confidence
89% confidence
Finding
The skill advertises and instructs execution of a wrapper that can access environment variables, read/write files, invoke shell commands, and use the network, yet it declares no permissions or user-facing consent boundaries. That mismatch prevents informed consent and weakens policy enforcement, especially because the same document also describes default cloud upload and leaderboard publication.

Tp4

High
Category
MCP Tool Poisoning
Confidence
86% confidence
Finding
The description frames the skill as a simple 'lobster tasting' workflow, but the documented behavior appears much broader: benchmarking, bootstrapping, environment loading, backend API interaction, artifact generation, and session management. This kind of description-behavior mismatch can mislead users into authorizing a much more invasive workflow than they reasonably expect.

Description-Behavior Mismatch

High
Confidence
99% confidence
Finding
The manifest metadata advertises a narrow lobster-tasting/evaluation skill with cloud upload and leaderboard behavior, but the actual bundle defines a broad benchmark harness containing 50 unrelated coding, shell, planning, writing, and safety tasks. This is a strong capability mismatch that can mislead reviewers and users about what the skill is allowed to do, masking far more powerful behaviors than the declared purpose suggests.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The bundle exposes many capabilities unrelated to the stated skill purpose, including code modification, shell execution, networked package installation, prompt-injection exercises, translation, and business writing. In context, this unjustified breadth materially increases attack surface and enables abuse under the cover of an innocuous skill identity.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The task behavior is materially unrelated to the skill’s declared purpose of lobster tasting, cloud upload, personal results pages, and leaderboard entry. This kind of capability/intent mismatch is dangerous because it can conceal unexpected actions from users and reviewers, making it easier to smuggle in arbitrary development or execution steps under an unrelated skill description.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
The skill instructs installing an npm package and executing a local Node.js script even though those privileges are not justified by the advertised lobster-tasting use case. Unnecessary package installation and code execution expand the attack surface, can alter the environment, and may enable supply-chain or arbitrary-code risks if a dependency or script is changed later.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The prompt explicitly instructs the agent to run a local script from the working directory, which is a dangerous capability because repository-local files are untrusted input and may execute arbitrary code. In the context of a lobster tasting/evaluation skill, executing `dangerous.py` is unrelated to the stated purpose, making this especially suspicious and increasing the likelihood of arbitrary code execution or environment compromise.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The file's behavior does not match the declared lobster-tasting functionality and instead implements an interactive Python evaluator. This mismatch is dangerous because it suggests hidden or deceptive capability, increasing the likelihood that arbitrary code execution was intentionally embedded under an unrelated skill description.

Context-Inappropriate Capability

Critical
Confidence
100% confidence
Finding
Arbitrary code execution via eval() is not necessary for a lobster tasting or evaluation workflow, so its presence is highly suspicious and materially increases risk. In this skill context, the unjustified use of eval makes exploitation more dangerous because users and reviewers would not expect interpreter behavior from the advertised functionality.

Intent-Code Divergence

Medium
Confidence
92% confidence
Finding
The docstring openly states that the tool evaluates user input as Python expressions, which contradicts the manifest's stated purpose. While the docstring itself is not the exploit, this discrepancy is a strong indicator of deceptive implementation and supports the conclusion that the dangerous behavior is intentional or at least knowingly unrelated.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The helper enumerates and loads external `secrets.env` files from workspace-related locations that are not necessary for a lobster-tasting skill's stated purpose. This grants the skill ambient access to credentials and secrets from the broader execution environment, increasing the chance of unauthorized use, leakage, or downstream exfiltration by the imported runtime.

Context-Inappropriate Capability

Medium
Confidence
92% confidence
Finding
This block creates a virtual environment and installs packages dynamically, which materially exceeds the expected behavior of a tasting/evaluation skill. The capability enables network-backed code acquisition and execution in the local user context, making supply-chain abuse or unintended package execution possible.

Context-Inappropriate Capability

Medium
Confidence
90% confidence
Finding
Re-entering the program with a different interpreter is an unnecessary capability escalation for the skill's advertised function. In context, it makes the runtime harder to audit and can be combined with the bootstrap path to execute code in a newly prepared environment outside normal expectations.

Intent-Code Divergence

Low
Confidence
84% confidence
Finding
The user-facing messages describe limited certificate/report preparation, but the code installs a broader set of packages including pytest and pytest-json-report. This mismatch reduces transparency and may mislead users or reviewers about the actual capability being introduced, which is especially concerning in code that already self-provisions dependencies.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
This code uploads detailed per-task responses, status, errors, timing, token usage, and identifiers to a remote API. That is materially broader than a simple score upload and can expose user prompts, model outputs, and operational metadata, creating privacy and data-minimization risks if users did not explicitly consent to full submission telemetry.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
In v2 mode, the skill sends a full run report built from scores, raw results, config, and upload mode to the server, which likely expands data collection beyond leaderboard publication or final evaluation. Because the manifest describes tasting/evaluating and publishing results, this broader reporting path increases the risk of undisclosed collection of submission content and execution metadata.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The parser searches far beyond the current repository, including environment-defined roots, cwd ancestors, home-directory locations, and sibling workspaces, then reads the first matching SOUL.md/IDENTITY.md it finds. This can unintentionally ingest unrelated personal or workspace data and, in this skill's context, is more dangerous because the skill description explicitly says results are uploaded to the cloud and used for personal result pages/leaderboards, creating a plausible path for cross-project data exposure.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal