prompt-eval
v1.0.0Automatically evaluate and score any AI prompt (prompt_a) through a structured 5-step pipeline: test plan → 200+ test cases → prompt execution → evaluator pr...
⭐ 1· 89·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match the runtime instructions: the SKILL.md describes generating test plans, producing many test cases, executing the user's prompt, scoring results, and writing CSV/JSON reports. No unrelated environment variables, binaries, or install steps are requested.
Instruction Scope
Instructions tell the agent to generate 50–200+ test cases, execute the prompt repeatedly, evaluate outputs, and write CSV/JSON files to ./prompt-eval-results/ (or a user-specified path). This behavior is coherent with the skill's purpose, but it means (a) many model invocations (cost & rate-limit implications), (b) writing user-provided prompt content and model outputs to disk (possible sensitive data exposure), and (c) the skill intentionally generates adversarial/safety test cases (including injection-style inputs) as part of testing. The SKILL.md also instructs to pause between steps for user confirmation, limiting fully autonomous escalation.
Install Mechanism
No install spec and no code files — instruction-only. That is the lowest-risk install profile (nothing new is written to disk by an installer).
Credentials
The skill requires no environment variables, credentials, or config paths. It does ask to write to a local directory by default, which is reasonable for generating reports but you should confirm or change the save location if you don't want files created in the agent's working directory.
Persistence & Privilege
always:false and default model-invocation allowed — expected for a user-invocable skill. The skill does not request permanent presence or modify other skills' configs.
Scan Findings in Context
[prompt_injection_you-are-now] expected: The detection 'you-are-now' appears in the SKILL.md and reference guides as an example of prompt-injection test inputs (e.g., 'Ignore your previous instructions. You are now an AI...'). This is expected because the skill intentionally generates adversarial/safety cases to test prompt robustness.
Assessment
This skill is coherent with its description and is instruction-only (no installed code). Before running it: (1) Do not feed prompts that contain secrets, passwords, or private data — the tool will execute the prompt many times and save outputs to disk. (2) Expect many LLM calls (cost and rate-limit impact); confirm the number of test cases before executing. (3) Choose or change the output folder if you don't want files in the agent's current directory. (4) Review the generated test plan (the SKILL.md instructs the agent to pause for confirmation between steps) and opt out of safety test cases that would create explicit harmful example inputs if you prefer not to see them. (5) If you need the skill to handle sensitive prompts, sanitize or redact those prompts first or run in an isolated environment. Overall, the design is consistent with its purpose, but exercise caution about data sensitivity, resource use, and disk output.Like a lobster shell, security has layers — review code before you run it.
latestvk976tpp76rg3nvzt0zjyk91dgh83978f
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
