Mcp Builder test

ReviewAudited by ClawScan on May 10, 2026.

Overview

The guide is coherent, but its evaluation harness can let Claude call any tool on a target MCP server and share tool results with Anthropic, so it needs review before use on real accounts.

Use the reference guide freely, but be careful with the optional evaluation harness. Run it only against trusted MCP servers, preferably with read-only/test credentials, and assume tool outputs may be sent to Anthropic. Add a tool allowlist or approval step before using it with write-capable production services.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

If the target MCP server exposes tools that create, delete, post, or modify account data, an evaluation run could invoke them under the user's credentials.

Why it was flagged

The harness gives Claude the full MCP tool list and automatically executes tool calls. The reference evaluation process calls for read-only tasks, but this code does not enforce read-only tools, block destructive annotations, or require user approval per call.

Skill content

When given a task, you MUST:\n1. Use the available tools ... while response.stop_reason == "tool_use": ... tool_result = await connection.call_tool(tool_name, tool_input)

Recommendation

Run evaluations only against test servers or read-only credentials unless an explicit tool allowlist, destructive-tool filter, and per-call approval are added.

NoteHigh Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Data returned by evaluated MCP tools, and possibly error details, may be sent to Anthropic during evaluation.

Why it was flagged

Results returned by the target MCP server are added to the conversation sent to Anthropic's API. This is expected for a Claude-based evaluator, but it is an important data flow.

Skill content

tool_response = json.dumps(tool_result) ... messages.append({ ... "content": tool_response }) ... client.messages.create(... messages=messages, tools=tools)

Recommendation

Avoid running the evaluator on sensitive production data unless this provider data sharing is acceptable; prefer redaction, test datasets, and least-privilege/read-only tokens.

NoteHigh Confidence

ASI03: Identity and Privilege Abuse

What this means

The evaluator can operate with whatever privileges are present in the supplied headers or environment variables.

Why it was flagged

The connection helper can pass environment variables and HTTP headers to MCP servers, which commonly include API keys, bearer tokens, or other credentials.

Skill content

StdioServerParameters(command=self.command, args=self.args, env=self.env) ... sse_client(url=self.url, headers=self.headers)

Recommendation

Use dedicated, least-privilege credentials and avoid passing production admin tokens to evaluation runs.

NoteHigh Confidence

ASI05: Unexpected Code Execution

What this means

Running an untrusted MCP server command through the evaluator could execute untrusted local code.

Why it was flagged

The stdio transport starts a user-specified command to connect to a local MCP server. This is normal MCP plumbing, but it means running the evaluator can execute local server commands.

Skill content

return stdio_client(StdioServerParameters(command=self.command, args=self.args, env=self.env))

Recommendation

Only use stdio commands for MCP servers you trust and understand; prefer isolated environments for third-party servers.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

If the helper scripts are installed later, dependency behavior could change over time.

Why it was flagged

The optional helper script dependencies use lower-bound version constraints rather than pinned versions, so future installs may resolve to different package versions.

Skill content

anthropic>=0.39.0\nmcp>=1.1.0

Recommendation

For reproducible or production evaluation use, pin dependencies or use a lockfile in an isolated virtual environment.