Install
openclaw skills install api-benchmarkBenchmark token generation speed across multiple LLM API providers. Measures TTFT (Time To First Token), tokens-per-second throughput, and total generation time. Use when comparing performance of different API providers, models, or testing API connectivity. Requires OpenCLAW config with provider definitions.
openclaw skills install api-benchmarkThis skill benchmarks token generation speed across multiple LLM API providers.
Use this skill when you need to:
python3 main.py --targets
python3 main.py run --label <target-label>
python3 main.py run --all
python3 main.py check --label <target-label>
python3 main.py check --all
-l, --label: Specific target label to benchmark-a, --all: Run on all available targets-r, --repeat: Number of runs per prompt level (default: 1)-c, --category: Run specific prompt category (can repeat: -c short -c medium). Options: short, medium, long-q, --quiet: Quiet mode - suppress progress output--timeout N: Request timeout in seconds (default: 120)--table: Output as formatted table (default: JSON)The tool reads configuration from ~/.openclaw/openclaw.json. Targets are defined in the models.providers section with:
baseUrl: API base URLapiKey: Authentication key (or ${ENV_VAR} to read from environment variable)api: API format (anthropic-messages, openai-completions, openai-responses)models: List of model configurationsSecurity Note: Instead of hardcoding API keys in the config file, use environment variable placeholders:
"apiKey": "${ANTHROPIC_API_KEY}" will read from the ANTHROPIC_API_KEY environment variableExample provider config:
{
"models": {
"providers": {
"my-provider": {
"baseUrl": "https://api.example.com",
"apiKey": "sk-xxx",
"api": "openai-completions",
"models": [
{ "id": "model-name", "api": "openai-completions" }
]
}
}
}
}
Note: Token counts are reported by the API when available. If the API doesn't return token counts, they are estimated at 4 characters per token.
# Check if a specific target is reachable
python3 main.py check --label my-provider
# Benchmark a single target
python3 main.py run --label my-provider --repeat 3
# Compare all targets
python3 main.py run --all --table