Ollama Proxy

v1.0.2

Ollama proxy — one endpoint that routes to multiple Ollama instances. Drop-in Ollama proxy replacement for localhost:11434. Same Ollama API, same model names...

3· 61·1 current·1 all-time
byTwin Geeks@twinsgeeks
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (Ollama proxy/route across Ollama nodes) aligns with the instructions: it tells users to pip install an ollama-herd package and run herd/herd-node, and the declared binaries (curl/wget) are reasonable for health checks and curl examples. The included config paths (~/.fleet-manager/latency.db, ~/.fleet-manager/logs/herd.jsonl) are consistent with a fleet manager that stores traces and logs.
Instruction Scope
SKILL.md stays on-topic (install package, run proxy and node, point apps at proxy). It explicitly describes auto-discovery across your network, health checks, request tracing and a local SQLite trace store—these are expected for this proxy but have privacy implications because request payloads (prompts/responses) may be logged and aggregated. The instructions do not ask to read unrelated system files or credentials.
Install Mechanism
There is no formal install spec in the registry (instruction-only) but SKILL.md tells users to pip install ollama-herd from PyPI. Installing from PyPI is a common, traceable mechanism (moderate risk). Because code is not included here, the actual package contents and behavior on install/run are not visible in this review and should be inspected before use.
Credentials
The skill declares no required environment variables or credentials. That is proportionate to a network proxy. The metadata lists optional binaries (python3/pip) and config paths for local logs/traces which are expected for a fleet manager.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges in the SKILL.md. It runs as a user-level service (herd/herd-node) and writes to per-user config/log paths; no evidence it modifies other skills or system-wide configs.
Assessment
This skill appears to do what it claims (proxy and aggregate Ollama nodes), but you should: 1) Inspect the ollama-herd PyPI package and its GitHub repo (links are provided) before installing to confirm there is no unexpected behavior. 2) Be aware the proxy auto-discovers nodes on the local network and stores traces/logs (SQLite and JSONL) under ~/.fleet-manager — those stores may contain request payloads (prompts/responses). If prompt confidentiality matters, run the proxy in an isolated network or audit/disable tracing. 3) Run the package with least privilege and review what it sends over the network (health checks, telemetry). 4) Because the registry listing is instruction-only (no embedded code), verify the upstream project and release authenticity on PyPI/GitHub; that would raise confidence from medium to high.

Like a lobster shell, security has layers — review code before you run it.

aidervk975c8cmrxwzbtpj0gzexhkfqs8415tqcontinue-devvk975c8cmrxwzbtpj0gzexhkfqs8415tqdrop-invk975c8cmrxwzbtpj0gzexhkfqs8415tqfailovervk975c8cmrxwzbtpj0gzexhkfqs8415tqlangchainvk975c8cmrxwzbtpj0gzexhkfqs8415tqlatestvk971dd85a4dfemyavwn63cbv5n84535vload-balancervk975c8cmrxwzbtpj0gzexhkfqs8415tqmulti-nodevk975c8cmrxwzbtpj0gzexhkfqs8415tqollamavk975c8cmrxwzbtpj0gzexhkfqs8415tqollama-proxyvk975c8cmrxwzbtpj0gzexhkfqs8415tqollama-routervk975c8cmrxwzbtpj0gzexhkfqs8415tqopen-webuivk975c8cmrxwzbtpj0gzexhkfqs8415tqproxyvk975c8cmrxwzbtpj0gzexhkfqs8415tq

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

globe Clawdis
OSmacOS · Linux · Windows
Any bincurl, wget

SKILL.md

Ollama Proxy — One Endpoint for All Your Ollama Instances

You have Ollama running on multiple machines. Instead of hardcoding IPs and manually picking which Ollama instance to hit, point everything at the Ollama proxy. The Ollama proxy routes to the best available device automatically.

Before:  App → http://macmini:11434  (one Ollama instance, hope it's not busy)
After:   App → http://ollama-proxy:11435   (Ollama proxy picks the best machine)

Set up the Ollama proxy

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/

On one machine (the Ollama proxy):

herd    # starts the Ollama proxy on port 11435

On every machine running Ollama:

herd-node    # discovers the Ollama proxy automatically on your network

Now point your apps at http://ollama-proxy:11435 instead of http://localhost:11434. Same Ollama API, same model names, same streaming — the Ollama proxy handles smarter routing.

Drop-in Ollama proxy replacement

Every Ollama API endpoint works through the Ollama proxy:

# Chat via Ollama proxy (same as direct Ollama)
curl http://ollama-proxy:11435/api/chat -d '{
  "model": "llama3.3:70b",
  "messages": [{"role": "user", "content": "Hello via Ollama proxy"}]
}'

# Generate via Ollama proxy (same as direct Ollama)
curl http://ollama-proxy:11435/api/generate -d '{
  "model": "qwen3:32b",
  "prompt": "Explain quantum computing via Ollama proxy"
}'

# List models via Ollama proxy (aggregated from all Ollama nodes)
curl http://ollama-proxy:11435/api/tags

# List loaded models via Ollama proxy (across all Ollama nodes)
curl http://ollama-proxy:11435/api/ps

OpenAI-compatible Ollama proxy API

The Ollama proxy also exposes an OpenAI-compatible endpoint — same models, no code changes:

from openai import OpenAI

# Point at the Ollama proxy instead of direct Ollama
ollama_proxy_client = OpenAI(base_url="http://ollama-proxy:11435/v1", api_key="not-needed")
ollama_proxy_response = ollama_proxy_client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Hello via Ollama proxy"}],
    stream=True,
)

What the Ollama proxy does that direct Ollama doesn't

FeatureDirect OllamaOllama Proxy (Herd)
Multiple machinesManual IP switchingOllama proxy routes automatically
Load balancingNoneOllama proxy scores on 7 signals
FailoverNoneOllama proxy auto-retries on next node
Model discoveryPer-machine OllamaOllama proxy aggregates fleet-wide
Queue managementNoneOllama proxy manages per-node:model queues
DashboardNoneOllama proxy provides real-time web UI
Health checksNoneOllama proxy runs 15 automated checks
Request tracingNoneOllama proxy logs to SQLite trace store
Image generationNoneOllama proxy routes mflux + DiffusionKit
Speech-to-textNoneOllama proxy routes Qwen3-ASR

Ollama proxy works with your existing tools

Just change the Ollama URL to the Ollama proxy — no other configuration needed:

ToolBefore (direct Ollama)After (Ollama proxy)
Open WebUIhttp://localhost:11434http://ollama-proxy:11435
Aider--openai-api-base http://localhost:11434/v1--openai-api-base http://ollama-proxy:11435/v1
Continue.devOllama at localhostOllama proxy at ollama-proxy:11435
LangChainOllama(base_url="http://localhost:11434")Ollama(base_url="http://ollama-proxy:11435")
LiteLLMollama/llama3.3:70bollama/llama3.3:70b (point at Ollama proxy)
CrewAIOPENAI_API_BASE=http://localhost:11434/v1OPENAI_API_BASE=http://ollama-proxy:11435/v1

How the Ollama proxy routes requests

When a request arrives at the Ollama proxy, it scores all Ollama nodes that have the requested model:

  1. Thermal state — is the model already loaded in the Ollama instance (hot)?
  2. Memory fit — does the Ollama node have enough free RAM?
  3. Queue depth — is the Ollama node busy with other requests?
  4. Latency history — how fast has this Ollama node been recently?
  5. Role affinity — the Ollama proxy sends big models to big machines
  6. Availability trend — is this Ollama node reliably available?
  7. Context fit — does the loaded context window match the request?

The highest-scoring Ollama node wins. If it fails, the Ollama proxy retries on the next best node automatically.

Monitor your Ollama proxy fleet

Ollama proxy dashboard at http://ollama-proxy:11435/dashboard — see every Ollama node, every model, every queue in real time.

# Ollama proxy fleet overview
curl -s http://ollama-proxy:11435/fleet/status | python3 -m json.tool

# Ollama proxy health checks
curl -s http://ollama-proxy:11435/dashboard/api/health | python3 -m json.tool

Full documentation

Contribute

Ollama Herd (the Ollama proxy) is open source (MIT). We welcome contributions:

  • Star on GitHub — help others find the Ollama proxy
  • Open an issue — bug reports, feature requests
  • PRs welcomeCLAUDE.md gives AI agents full Ollama proxy context. 444 tests, async Python.

Guardrails

  • No automatic model downloads — the Ollama proxy requires explicit user confirmation for model pulls.
  • Model deletion requires explicit user confirmation via the Ollama proxy.
  • All Ollama proxy requests stay local — no data leaves your network.
  • Never delete or modify files in ~/.fleet-manager/.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…