Arxiv Agentic Verifier
v1.0.0Actively verifies Python/JS code correctness by generating targeted test cases that expose logic flaws based on problem constraints.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The code implements an agentic verifier that generates LLM-driven test cases and executes candidate Python/JS code — that's consistent with the skill name/description. However the registry metadata declares no required environment variables or primary credential while the SKILL.md and index.js clearly require an OPENAI_API_KEY (and the package.json depends on the openai client). The missing declaration in metadata is an incoherence that matters for permission review.
Instruction Scope
Runtime instructions and the code will: (1) send the problem description and candidate code to the OpenAI API for test-generation, and (2) write candidate code to disk and run it with child_process.execSync (python3 or node). Both actions are expected for this tool, but they carry clear privacy and host-safety implications. The SKILL.md warns about sandboxing for code execution but does not explicitly warn that candidate code/problem text will be transmitted to OpenAI (possible sensitive-data exposure).
Install Mechanism
There is no install spec (instruction-only at registry level), but the package includes package.json/package-lock.json and depends on 'openai' and 'axios'. That means an npm install would pull third-party packages; absence of an explicit install step in metadata is a packaging/manifest inconsistency but not an immediate red flag (no remote arbitrary download URLs are present).
Credentials
The only credential the code needs is an OpenAI API key, which is proportionate to the stated LLM-based test-generation purpose. However the registry metadata fails to declare OPENAI_API_KEY or a primaryEnv, creating a mismatch between what the skill actually requires and what is advertised. The skill will transmit user-provided problem text and candidate code to the OpenAI API — this is expected but should be declared explicitly so users can judge data-leak risk.
Persistence & Privilege
The skill does not request always:true, does not attempt to modify other skills or system-wide settings, and only writes temporary files under its own directory (temp_exec) and removes them. It executes candidate code locally via execSync, which is expected for this functionality but increases the need for sandboxing.
What to consider before installing
This skill implements exactly what it claims (LLM-driven test generation plus executing candidate code), but there are three things to consider before installing:
- Metadata mismatch: The registry metadata does not list OPENAI_API_KEY, but the code and SKILL.md require it and the skill will call OpenAI. Expect to provide an API key if you want real LLM behavior. Ask the publisher to update the metadata to declare this requirement.
- Data exposure: The skill sends the problem description and the candidate code to OpenAI. If those inputs contain sensitive information (proprietary code, secrets, or private problem statements), they will be transmitted to the OpenAI service. Do not use this skill with sensitive inputs unless you accept this data flow.
- Execution risk: The skill writes arbitrary candidate code to disk and executes it with python3/node via execSync. Run it only in a restricted/sandboxed environment (container or VM) and never on a machine with sensitive access. The SKILL.md warns about sandboxing, but you should enforce it.
Practical steps: verify package.json and node_modules before running; run npm install in an isolated environment; require the author to update the registry metadata to declare OPENAI_API_KEY; prefer running tests with a mock mode (no API key) or inside a runtime sandbox; and review any network traffic to confirm only the OpenAI API endpoint is contacted.Like a lobster shell, security has layers — review code before you run it.
latest
ArXiv Agentic Verifier
Source Paper: Scaling Agentic Verifier for Competitive Coding (ID: 4a4c4dae6a5145ebc4d62eb2d64b0f0f) Type: Code Verification / Test Generation
Description
This skill implements an "Agentic Verifier" that actively reasons about code correctness by generating targeted, "discriminative" test cases. Instead of random sampling, it analyzes the problem constraints and code logic to find edge cases or logic flaws.
Features
- Analyze Code: Understands Python/JS code logic.
- Generate Tests: Creates specific inputs to break the code.
- Execute & Verify: Runs the code against generated tests (sandbox recommended for production).
Usage
const AgenticVerifier = require('./index');
const verifier = new AgenticVerifier(process.env.OPENAI_API_KEY);
const problem = "Given two integers A and B, output their sum.";
const code = "print(int(input().split()[0]) + int(input().split()[1]))";
verifier.verify(problem, code, 'python')
.then(result => console.log(result))
.catch(err => console.error(err));
Configuration
- OPENAI_API_KEY: Required for LLM reasoning.
Security Warning
This skill executes code provided to it. Use in a restricted environment or sandbox.
Comments
Loading comments...
