API Failover

Data & APIs

Detect AI API/provider/model failures and route requests to healthy fallback providers or downgraded models. Use when creating or maintaining automatic failover, downgrade routing, provider health checks, circuit breakers, retry policy, model fallback chains, graceful degradation for LLM/API calls, or semi-automatic failover proxy deployment.

Install

openclaw skills install api-failover

API Failover

Create or improve a lightweight failover layer for AI APIs.

Goals

Build systems that:

detect unavailable or degraded providers/models
classify failures before retrying blindly
switch to a safe fallback chain
avoid hammering broken endpoints
recover back to preferred providers after cooldown

Workflow

Identify the call path.
Classify failure modes.
Define a fallback policy.
Add health memory.
Implement guarded retries.
Emit observable logs.
Validate with forced-failure tests.

Use the detailed rules below and the bundled scripts instead of re-inventing routing logic each time.

Practical defaults

Error classes

Use these normalized categories:

AUTH_ERROR
BAD_REQUEST
RATE_LIMIT
TIMEOUT
SERVER_ERROR
NETWORK_ERROR
MODEL_UNAVAILABLE
QUOTA_EXCEEDED
UNKNOWN_TRANSIENT

Suggested routing behavior

AUTH_ERROR, BAD_REQUEST: fail fast; do not retry other providers unless config explicitly maps to another credential set.
RATE_LIMIT: short backoff, then fallback.
TIMEOUT, SERVER_ERROR, NETWORK_ERROR, MODEL_UNAVAILABLE, UNKNOWN_TRANSIENT: retry briefly, then fallback.
QUOTA_EXCEEDED: mark provider unavailable for a longer cooldown and fallback immediately.

Circuit breaker defaults

Start with:

open after 3 consecutive transient failures
cooldown 60-180s
half-open with 1 probe
close after 1-2 successful probes

Configuration pattern

Keep policy in config, not hard-coded logic.

Recommended shape:

provider registry
task profiles with ordered fallback chains
retry policy
circuit-breaker policy
per-provider overrides

Design guidance

Prefer fewer, well-understood providers over large fallback chains.
Keep the fallback chain semantically compatible when possible.
Separate "best quality" from "must return something" behavior.
Keep downgrade rules explicit; avoid silent huge capability drops for critical tasks.
For tool-using agents, treat provider switching as a reliability event and report it when user-visible quality may change.

Semi-automatic deployment model

Use this skill to discover the environment, generate a production-ish config, run a local HTTP failover proxy, and verify health.

Do not claim full autonomous takeover unless the environment-specific integration is actually completed.

References

Read these only when needed:

references/config-example.yaml for a compact policy example
references/config-realworld-example.yaml for a more practical multi-provider template
references/config-production.yaml for a ready-to-edit production template
references/test-scenarios.md for failure-injection and validation cases
references/realworld-notes.md for local proxy deployment and environment-variable setup
references/api-failover.service for a user-systemd service example

Bundled scripts

`scripts/discover_env.py`

Inspect the current environment.

`scripts/generate_config.py`

Generate a production-ish YAML config from simple defaults.

`scripts/failover_proxy.py`

Run a minimal CLI failover call path.

`scripts/http_proxy.py`

Expose a single local OpenAI-compatible entrypoint.

Endpoints:

POST /v1/chat/completions
GET /health

Optional request header:

X-Failover-Profile: cheap|default|critical|local-first

`scripts/selfcheck.py`

Validate that the local proxy is reachable and can process a minimal chat request.

`scripts/bootstrap_failover.py`

Run the semi-automatic bootstrap flow:

discover environment
generate config
optionally start the proxy
run self-check
print next actions

Example:

python3 scripts/bootstrap_failover.py \
  --default-model custom-ai-td-ee/gpt-5.4 \
  --start-proxy

Keep these scripts small and inspectable. Extend them instead of turning SKILL.md into code-heavy instructions.

API Failover

Install

API Failover

Goals

Workflow

Practical defaults

Error classes

Suggested routing behavior

Circuit breaker defaults

Configuration pattern

Design guidance

Semi-automatic deployment model

References

Bundled scripts

scripts/discover_env.py

scripts/generate_config.py

scripts/failover_proxy.py

scripts/http_proxy.py

scripts/selfcheck.py

scripts/bootstrap_failover.py

Related skills

`scripts/discover_env.py`

`scripts/generate_config.py`

`scripts/failover_proxy.py`

`scripts/http_proxy.py`

`scripts/selfcheck.py`

`scripts/bootstrap_failover.py`