suspicious.dangerous_exec
- Location
- packages/cli/src/commands/__tests__/eval.test.ts:44
- Finding
- Shell command execution detected (child_process).
AdvisoryAudited by Static analysis on May 10, 2026.
Detected: suspicious.dangerous_exec, suspicious.dynamic_code_execution, suspicious.env_credential_access
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A skill you evaluate may run local code under the evaluator's permissions.
The framework launches configured skill subprocesses. This fits the stated evaluation purpose, but it means evaluated skills can execute code on the local machine unless containment is strong.
const child = spawn(finalExec, finalArgs, {Use Docker or another strong sandbox for untrusted skills, avoid running with sensitive environment variables, and review skill entrypoints before evaluation.
A mistaken or malicious HTTP skill configuration could cause a local secret to be used with a configured network endpoint.
The HTTP adapter reads a token from an environment variable chosen at runtime. In a framework that evaluates externally supplied skill configurations, this needs explicit allowlisting/approval; the registry metadata declares no env vars or primary credential.
const token = process.env[envKey];
Document credential use in metadata, restrict which environment variables may be read, require explicit user approval before sending bearer tokens, and run evaluations in a clean environment.
The example calculator should be treated as a demo, not as a hardened parser for untrusted input.
The bundled calculator example uses Python `eval` on an allowlisted expression. The allowlist reduces arbitrary code risk, but `eval` remains a sensitive pattern and could still allow expensive expressions.
if all(c in allowed for c in expression):
result = eval(expression)Replace `eval` with a safe expression parser or impose strict complexity and resource limits if this example is used beyond tests.
Installing from source may execute dependency/build scripts from the cloned project and its package ecosystem.
Manual installation runs code from an external Git repository and package-manager build steps. This is normal for a source-installed CLI, but users should verify provenance.
git clone https://github.com/isLinXu/eval-skills.git cd eval-skills pnpm install && pnpm build
Verify the repository, prefer pinned/locked installs, and review dependency changes before running build or install commands.
Evaluation inputs, outputs, and reports may remain on disk after the run.
Evaluation results are persisted to a local SQLite database by default when using the store option. This is purpose-aligned but can retain evaluated task data.
`--store <path>` | SQLite database path for persistent result storage | `./eval-skills.db`
Choose storage paths intentionally, avoid evaluating sensitive data unless needed, and delete or protect the database/reports when appropriate.
Private task data or skill outputs may be shared with an LLM provider when LLM judging is enabled.
The LLM-judge scorer uses an LLM key, implying evaluation content may be sent to an LLM-backed service. This is expected for LLM judging but should be noticed for private benchmarks or outputs.
### LLM Judge ... Requires `EVAL_SKILLS_LLM_KEY` environment variable.
Use LLM judging only with data you are allowed to send externally, and confirm which provider/key is configured.