Singleshot Prompt Testing
Test and optimize prompts for cost, token use, and performance with detailed reports using single shot queries across multiple providers and models.
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 3 · 1.4k · 0 current installs · 0 all-time installs
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The skill is named and documented as a prompt cost/testing/optimization helper and all instructions show usage of a singleshot CLI for generating token/cost reports. The declared requirements (API keys for providers) match the described multi-provider testing capability and nothing extraneous (e.g., cloud admin creds) is requested.
Instruction Scope
SKILL.md instructs the agent to run the singleshot CLI, generate reports, cat/grep/diff report files, and optionally point the CLI at providers via environment variables. These actions are within the stated purpose. One noteworthy point: the documentation allows configuring OPENAI_BASE_URL (a custom endpoint) and other provider endpoints which can redirect model requests to arbitrary servers—this is a normal feature for alternate endpoints but increases risk if you point keys to untrusted endpoints.
Install Mechanism
The published skill is instruction-only and contains no automated install spec. The docs recommend installing a third-party CLI via Homebrew tap (vincentzhangz/singleshot) or cargo. That is consistent with a CLI-based skill; however, installing from a third-party tap or crate is an explicit user action and you should audit the upstream repo before installing.
Credentials
The skill recommends supplying provider API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY) which are directly relevant to calling model providers. No unrelated secrets or system credentials are requested. Caution: OPENAI_BASE_URL and similar endpoint variables can be used to route requests (and therefore your keys/data) to nonstandard endpoints—only set them to trusted URLs.
Persistence & Privilege
The skill does not request always:true and is user-invocable only. It is instruction-only and does not install persistent hooks or modify other skills or global agent settings. It therefore requests no elevated persistence or privileges.
Assessment
This skill appears coherent and implements what it says: a wrapper/workflow for a third-party singleshot CLI that measures tokens, costs, and latency. Before installing or running the CLI yourself: 1) Inspect the upstream repository (https://github.com/vincentzhangz/singleshot) and the Homebrew tap to confirm code provenance; 2) Only provide API keys for providers you trust, and avoid setting OPENAI_BASE_URL or other custom endpoints to unknown servers (they could receive your requests and keys); 3) Prefer using local/no-key options (e.g., Ollama) for early testing; 4) Consider using scoped or short-lived keys if supported, and do not paste keys into public files. If you want a deeper review, provide the upstream repo or the installed binary source and I can look for network calls, telemetry, or unexpected behavior.Like a lobster shell, security has layers — review code before you run it.
Current versionv0.1.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Singleshot Prompt Testing & Optimization Skill
Description
Prompt cost testing with single shot
Installation
brew tap vincentzhangz/singleshot
brew install singleshot
Or: cargo install singleshot
When to Use
- Testing new prompts before openclaw implementation
- Benchmarking prompt variations for token efficiency
- Comparing model performance and costs
- Validating prompt outputs before production
Core Commands
Always use -d (detail) and -r (report) flags for efficiency analysis:
# Basic test with full metrics
singleshot chat -p "Your prompt" -P openai -d -r report.md
# Test with config file
singleshot chat -l config.md -d -r report.md
# Compare providers
singleshot chat -p "Test" -P openai -m gpt-4o-mini -d -r openai.md
singleshot chat -p "Test" -P anthropic -m claude-sonnet-4-20250514 -d -r anthropic.md
# Batch test variations
for config in *.md; do
singleshot chat -l "$config" -d -r "report-${config%.md}.md"
done
Report Analysis Workflow
1. Generate Baseline
singleshot chat -p "Your prompt" -P openai -d -r baseline.md
cat baseline.md
2. Optimize & Compare
# Create optimized version, test, and compare
cat > optimized.md << 'EOF'
---provider---
openai
---model---
gpt-4o-mini
---max_tokens---
200
---system---
Expert. Be concise.
---prompt---
Your optimized prompt
EOF
singleshot chat -l optimized.md -d -r optimized-report.md
# Compare metrics
echo "Baseline:" && grep -E "(Tokens|Cost)" baseline.md
echo "Optimized:" && grep -E "(Tokens|Cost)" optimized-report.md
Report Metrics
Reports contain:
## Token Usage
- Input Tokens: 245
- Output Tokens: 180
- Total Tokens: 425
## Cost (estimated)
- Input Cost: $0.00003675
- Output Cost: $0.000108
- Total Cost: $0.00014475
## Timing
- Time to First Token: 0.45s
- Total Time: 1.23s
Optimization Strategies
-
Test with cheaper models first:
singleshot chat -p "Test" -P openai -m gpt-4o-mini -d -r report.md -
Reduce tokens:
- Shorten system prompts
- Use
--max-tokensto limit output - Add "be concise" to system prompt
-
Test locally (free):
singleshot chat -p "Test" -P ollama -m llama3.2 -d -r report.md
Example: Full Optimization
# Step 1: Baseline (verbose)
singleshot chat \
-p "How do I write a Rust function to add two numbers?" \
-s "You are an expert Rust programmer with 10 years experience" \
-P openai -d -r v1.md
# Step 2: Read metrics
cat v1.md
# Expected: ~130 input tokens, ~400 output tokens
# Step 3: Optimized version
singleshot chat \
-p "Rust function: add(a: i32, b: i32) -> i32" \
-s "Rust expert. Code only." \
-P openai --max-tokens 100 -d -r v2.md
# Step 4: Compare
echo "=== COMPARISON ==="
grep "Total Cost" v1.md v2.md
grep "Total Tokens" v1.md v2.md
Quick Reference
# Test with full details
singleshot chat -p "prompt" -P openai -d -r report.md
# Extract metrics
grep -E "(Input|Output|Total)" report.md
# Compare reports
diff report1.md report2.md
# Vision test
singleshot chat -p "Describe" -i image.jpg -P openai -d -r report.md
# List models
singleshot models -P openai
# Test connection
singleshot ping -P openai
Environment Variables
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."
Best Practices
- Always use
-dfor detailed token metrics - Always use
-rto save reports - Always
catreports to analyze metrics - Test variations and compare costs
- Set
--max-tokensto control costs - Use gpt-4o-mini for testing (cheaper)
Troubleshooting
- No metrics: Ensure
-dflag is used - No report file: Ensure
-rflag is used - High costs: Switch to gpt-4o-mini or Ollama
- Connection issues: Run
singleshot ping -P <provider>
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
