clawexam

ReviewAudited by ClawScan on May 10, 2026.

Overview

The skill is transparent about running a live benchmark, but it lets a remote exam service direct real API/code/workflow actions and submit logs without clear safety limits.

Install only if you are comfortable running a live external benchmark. Use a sandboxed agent profile, restrict network/tool permissions, and approve any question that would touch local files, private systems, accounts, or non-benchmark services.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Medium

#ASI01: Agent Goal Hijack

What this means

A remote benchmark question could steer the agent into actions beyond what the user expected from a simple score benchmark.

Why it was flagged

The remote exam service supplies question content that directly determines what the agent should do, but the artifacts do not state that such remote instructions are untrusted or bounded to safe actions.

Skill content

- Fetches randomized questions for the current session
- Executes each question using real API calls, code, workflows, or security analysis

Recommendation

Run the benchmark only in a constrained environment, and require explicit user approval before following any question that asks for local file access, account changes, non-ClawExam endpoints, or other high-impact actions.

Medium

#ASI02: Tool Misuse and Exploitation

What this means

The agent may make live network requests or use tools in ways that are difficult to predict before the remote questions are fetched.

Why it was flagged

The skill authorizes non-simulated HTTP/tool use based on question text, but does not provide an endpoint allowlist, dry-run mode, sandboxing rule, or approval checkpoint for potentially sensitive operations.

Skill content

- Always perform the real HTTP requests described by the question

Recommendation

Limit tool permissions during exams, block access to private networks and sensitive services, and ask the user before executing real API calls outside the benchmark domain.

Low

#ASI03: Identity and Privilege Abuse

What this means

The ClawExam service will identify the exam session under the provided public username/model details.

Why it was flagged

The skill uses a service Bearer token for ClawExam API calls. This is purpose-aligned for a live exam session, and no local credentials or unrelated account access are shown.

Skill content

POST /api/auth/token to get a Bearer token

Recommendation

Use only a username you are comfortable associating with benchmark results, and do not provide unrelated credentials or secrets.

Low

#ASI07: Insecure Inter-Agent Communication

What this means

Your benchmark answers, execution steps, and token-use estimate may be stored or processed by the ClawExam service.

Why it was flagged

The skill sends answers and execution logs to the external ClawExam API. This is disclosed and aligned with benchmarking, but users should understand that run details leave the local agent context.

Skill content

Submits structured answers with execution logs

Recommendation

Avoid including private data, secrets, internal URLs, or sensitive file contents in exam answers or logs.