Install
openclaw skills install ollama-proxyOllama proxy — one endpoint that routes to multiple Ollama instances. Drop-in Ollama proxy replacement for localhost:11434. Same Ollama API, same model names, but the Ollama proxy routes requests to the best device. Auto-discovers Ollama nodes, scores on 7 signals, retries on failure. Works with Open WebUI, LangChain, Aider. Ollama代理 | proxy Ollama
openclaw skills install ollama-proxyYou have Ollama running on multiple machines. Instead of hardcoding IPs and manually picking which Ollama instance to hit, point everything at the Ollama proxy. The Ollama proxy routes to the best available device automatically.
Before: App → http://macmini:11434 (one Ollama instance, hope it's not busy)
After: App → http://ollama-proxy:11435 (Ollama proxy picks the best machine)
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
On one machine (the Ollama proxy):
herd # starts the Ollama proxy on port 11435
On every machine running Ollama:
herd-node # discovers the Ollama proxy automatically on your network
Now point your apps at http://ollama-proxy:11435 instead of http://localhost:11434. Same Ollama API, same model names, same streaming — the Ollama proxy handles smarter routing.
Every Ollama API endpoint works through the Ollama proxy:
# Chat via Ollama proxy (same as direct Ollama)
curl http://ollama-proxy:11435/api/chat -d '{
"model": "llama3.3:70b",
"messages": [{"role": "user", "content": "Hello via Ollama proxy"}]
}'
# Generate via Ollama proxy (same as direct Ollama)
curl http://ollama-proxy:11435/api/generate -d '{
"model": "qwen3:32b",
"prompt": "Explain quantum computing via Ollama proxy"
}'
# List models via Ollama proxy (aggregated from all Ollama nodes)
curl http://ollama-proxy:11435/api/tags
# List loaded models via Ollama proxy (across all Ollama nodes)
curl http://ollama-proxy:11435/api/ps
# Pull a model via Ollama proxy (auto-selects best node)
curl -N http://ollama-proxy:11435/api/pull -d '{"name": "codestral"}'
The Ollama proxy also exposes an OpenAI-compatible endpoint — same models, no code changes:
from openai import OpenAI
# Point at the Ollama proxy instead of direct Ollama
ollama_proxy_client = OpenAI(base_url="http://ollama-proxy:11435/v1", api_key="not-needed")
ollama_proxy_response = ollama_proxy_client.chat.completions.create(
model="llama3.3:70b",
messages=[{"role": "user", "content": "Hello via Ollama proxy"}],
stream=True,
)
| Feature | Direct Ollama | Ollama Proxy (Herd) |
|---|---|---|
| Multiple machines | Manual IP switching | Ollama proxy routes automatically |
| Load balancing | None | Ollama proxy scores on 7 signals |
| Failover | None | Ollama proxy auto-retries on next node |
| Model discovery | Per-machine Ollama | Ollama proxy aggregates fleet-wide |
| Queue management | None | Ollama proxy manages per-node:model queues |
| Dashboard | None | Ollama proxy provides real-time web UI |
| Health checks | None | Ollama proxy runs 15 automated checks |
| Request tracing | None | Ollama proxy logs to SQLite trace store |
| Image generation | None | Ollama proxy routes mflux + DiffusionKit |
| Speech-to-text | None | Ollama proxy routes Qwen3-ASR |
Just change the Ollama URL to the Ollama proxy — no other configuration needed:
| Tool | Before (direct Ollama) | After (Ollama proxy) |
|---|---|---|
| Open WebUI | http://localhost:11434 | http://ollama-proxy:11435 |
| Aider | --openai-api-base http://localhost:11434/v1 | --openai-api-base http://ollama-proxy:11435/v1 |
| Continue.dev | Ollama at localhost | Ollama proxy at ollama-proxy:11435 |
| LangChain | Ollama(base_url="http://localhost:11434") | Ollama(base_url="http://ollama-proxy:11435") |
| LiteLLM | ollama/llama3.3:70b | ollama/llama3.3:70b (point at Ollama proxy) |
| CrewAI | OPENAI_API_BASE=http://localhost:11434/v1 | OPENAI_API_BASE=http://ollama-proxy:11435/v1 |
When a request arrives at the Ollama proxy, it scores all Ollama nodes that have the requested model:
The highest-scoring Ollama node wins. If it fails, the Ollama proxy retries on the next best node automatically.
Ollama proxy dashboard at http://ollama-proxy:11435/dashboard — see every Ollama node, every model, every queue in real time.
# Ollama proxy fleet overview
curl -s http://ollama-proxy:11435/fleet/status | python3 -m json.tool
# Ollama proxy health checks
curl -s http://ollama-proxy:11435/dashboard/api/health | python3 -m json.tool
Ollama Herd (the Ollama proxy) is open source (MIT). We welcome contributions:
CLAUDE.md gives AI agents full Ollama proxy context. 444 tests, async Python.~/.fleet-manager/.