Singleshot Prompt Testing

Test and optimize prompts for cost, token use, and performance with detailed reports using single shot queries across multiple providers and models.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 3 · 1.4k · 0 current installs · 0 all-time installs

by@vincentzhangz

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The skill is named and documented as a prompt cost/testing/optimization helper and all instructions show usage of a singleshot CLI for generating token/cost reports. The declared requirements (API keys for providers) match the described multi-provider testing capability and nothing extraneous (e.g., cloud admin creds) is requested.

ℹ

Instruction Scope

SKILL.md instructs the agent to run the singleshot CLI, generate reports, cat/grep/diff report files, and optionally point the CLI at providers via environment variables. These actions are within the stated purpose. One noteworthy point: the documentation allows configuring OPENAI_BASE_URL (a custom endpoint) and other provider endpoints which can redirect model requests to arbitrary servers—this is a normal feature for alternate endpoints but increases risk if you point keys to untrusted endpoints.

✓

Install Mechanism

The published skill is instruction-only and contains no automated install spec. The docs recommend installing a third-party CLI via Homebrew tap (vincentzhangz/singleshot) or cargo. That is consistent with a CLI-based skill; however, installing from a third-party tap or crate is an explicit user action and you should audit the upstream repo before installing.

ℹ

Credentials

The skill recommends supplying provider API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY) which are directly relevant to calling model providers. No unrelated secrets or system credentials are requested. Caution: OPENAI_BASE_URL and similar endpoint variables can be used to route requests (and therefore your keys/data) to nonstandard endpoints—only set them to trusted URLs.

✓

Persistence & Privilege

The skill does not request always:true and is user-invocable only. It is instruction-only and does not install persistent hooks or modify other skills or global agent settings. It therefore requests no elevated persistence or privileges.

Assessment

This skill appears coherent and implements what it says: a wrapper/workflow for a third-party singleshot CLI that measures tokens, costs, and latency. Before installing or running the CLI yourself: 1) Inspect the upstream repository (https://github.com/vincentzhangz/singleshot) and the Homebrew tap to confirm code provenance; 2) Only provide API keys for providers you trust, and avoid setting OPENAI_BASE_URL or other custom endpoints to unknown servers (they could receive your requests and keys); 3) Prefer using local/no-key options (e.g., Ollama) for early testing; 4) Consider using scoped or short-lived keys if supported, and do not paste keys into public files. If you want a deeper review, provide the upstream repo or the installed binary source and I can look for network calls, telemetry, or unexpected behavior.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.0

Download zip

latestvk975m1y6g5gcdyxqnrkca71jxn80h546

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Singleshot Prompt Testing & Optimization Skill

Description

Prompt cost testing with single shot

Installation

brew tap vincentzhangz/singleshot
brew install singleshot

Or: cargo install singleshot

When to Use

Testing new prompts before openclaw implementation
Benchmarking prompt variations for token efficiency
Comparing model performance and costs
Validating prompt outputs before production

Core Commands

Always use -d (detail) and -r (report) flags for efficiency analysis:

# Basic test with full metrics
singleshot chat -p "Your prompt" -P openai -d -r report.md

# Test with config file
singleshot chat -l config.md -d -r report.md

# Compare providers
singleshot chat -p "Test" -P openai -m gpt-4o-mini -d -r openai.md
singleshot chat -p "Test" -P anthropic -m claude-sonnet-4-20250514 -d -r anthropic.md

# Batch test variations
for config in *.md; do
  singleshot chat -l "$config" -d -r "report-${config%.md}.md"
done

Report Analysis Workflow

1. Generate Baseline

singleshot chat -p "Your prompt" -P openai -d -r baseline.md
cat baseline.md

2. Optimize & Compare

# Create optimized version, test, and compare
cat > optimized.md << 'EOF'
---provider---
openai
---model---
gpt-4o-mini
---max_tokens---
200
---system---
Expert. Be concise.
---prompt---
Your optimized prompt
EOF

singleshot chat -l optimized.md -d -r optimized-report.md

# Compare metrics
echo "Baseline:" && grep -E "(Tokens|Cost)" baseline.md
echo "Optimized:" && grep -E "(Tokens|Cost)" optimized-report.md

Report Metrics

Reports contain:

## Token Usage
- Input Tokens: 245
- Output Tokens: 180
- Total Tokens: 425

## Cost (estimated)
- Input Cost: $0.00003675
- Output Cost: $0.000108
- Total Cost: $0.00014475

## Timing
- Time to First Token: 0.45s
- Total Time: 1.23s

Optimization Strategies

Test with cheaper models first:

singleshot chat -p "Test" -P openai -m gpt-4o-mini -d -r report.md

Reduce tokens:
- Shorten system prompts
- Use --max-tokens to limit output
- Add "be concise" to system prompt

Test locally (free):

singleshot chat -p "Test" -P ollama -m llama3.2 -d -r report.md

Example: Full Optimization

# Step 1: Baseline (verbose)
singleshot chat \
  -p "How do I write a Rust function to add two numbers?" \
  -s "You are an expert Rust programmer with 10 years experience" \
  -P openai -d -r v1.md

# Step 2: Read metrics
cat v1.md
# Expected: ~130 input tokens, ~400 output tokens

# Step 3: Optimized version
singleshot chat \
  -p "Rust function: add(a: i32, b: i32) -> i32" \
  -s "Rust expert. Code only." \
  -P openai --max-tokens 100 -d -r v2.md

# Step 4: Compare
echo "=== COMPARISON ==="
grep "Total Cost" v1.md v2.md
grep "Total Tokens" v1.md v2.md

Quick Reference

# Test with full details
singleshot chat -p "prompt" -P openai -d -r report.md

# Extract metrics
grep -E "(Input|Output|Total)" report.md

# Compare reports
diff report1.md report2.md

# Vision test
singleshot chat -p "Describe" -i image.jpg -P openai -d -r report.md

# List models
singleshot models -P openai

# Test connection
singleshot ping -P openai

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."

Best Practices

Always use -d for detailed token metrics
Always use -r to save reports
Always cat reports to analyze metrics
Test variations and compare costs
Set --max-tokens to control costs
Use gpt-4o-mini for testing (cheaper)

Troubleshooting

No metrics: Ensure -d flag is used
No report file: Ensure -r flag is used
High costs: Switch to gpt-4o-mini or Ollama
Connection issues: Run singleshot ping -P <provider>

Files

5 total

Select a file

Select a file to preview.

Comments

Loading comments…