Install
openclaw skills install self-hosted-aiSelf-hosted AI — run your own LLM inference, image generation, speech-to-text, and embeddings. No cloud APIs, no SaaS subscriptions, no data leaving your network. Self-hosted alternative to OpenAI, DALL-E, Whisper API, and cloud embedding services. Route across macOS, Linux, and Windows machines. 自托管AI本地推理平台。IA autoalojada sin dependencias en la nube.
openclaw skills install self-hosted-aiStop paying per token. Stop sending data to cloud APIs. Run self-hosted LLMs, self-hosted image generation, self-hosted speech-to-text, and self-hosted embeddings on your own hardware. One self-hosted router makes all your devices act like one system.
| Cloud service | Self-hosted replacement | How |
|---|---|---|
| OpenAI API | Self-hosted Llama 3.3, Qwen 3.5, DeepSeek-R1 via Ollama | Same OpenAI SDK, swap the base URL |
| DALL-E / Midjourney | Self-hosted Stable Diffusion 3, Flux via mflux/DiffusionKit | POST /api/generate-image |
| Whisper API | Self-hosted Qwen3-ASR via MLX | POST /api/transcribe |
| OpenAI Embeddings | Self-hosted nomic-embed-text, mxbai-embed via Ollama | POST /api/embed |
Same APIs. Same quality. Zero per-request costs. All data stays on your self-hosted machines.
pip install ollama-herd # Self-hosted AI router from PyPI
herd # start the self-hosted router
herd-node # run on each self-hosted machine — auto-discovers the router
No Docker. No Kubernetes. No config files. Self-hosted devices find each other automatically on your local network.
Drop-in self-hosted replacement for the OpenAI SDK:
from openai import OpenAI
# Self-hosted inference client — replaces OpenAI cloud
self_hosted_client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
self_hosted_response = self_hosted_client.chat.completions.create(
model="llama3.3:70b", # self-hosted model, no cloud dependency
messages=[{"role": "user", "content": "Analyze this contract for risks"}],
stream=True,
)
for chunk in self_hosted_response:
print(chunk.choices[0].delta.content or "", end="")
curl http://localhost:11435/api/chat -d '{
"model": "deepseek-r1:70b",
"messages": [{"role": "user", "content": "Explain self-hosted AI advantages over cloud APIs"}],
"stream": false
}'
Self-hosted replacement for DALL-E and Midjourney:
# Install self-hosted image backends on any node
uv tool install mflux # Self-hosted Flux models (~7s)
uv tool install diffusionkit # Self-hosted Stable Diffusion 3/3.5
# Generate on your self-hosted fleet
curl -o self_hosted_output.png http://localhost:11435/api/generate-image \
-H "Content-Type: application/json" \
-d '{"model": "z-image-turbo", "prompt": "self-hosted AI generating product mockup", "width": 1024, "height": 1024}'
Self-hosted replacement for Whisper API:
curl http://localhost:11435/api/transcribe \
-F "file=@self_hosted_meeting.wav" \
-F "model=qwen3-asr"
All self-hosted transcription stays on your network. No audio data sent to cloud services.
Self-hosted replacement for OpenAI's embedding API:
curl http://localhost:11435/api/embed \
-d '{"model": "nomic-embed-text", "input": "self-hosted document embedding for private RAG pipelines"}'
| Service | Cloud cost | Self-hosted cost |
|---|---|---|
| GPT-4o (1M tokens/month) | ~$15-30/month | $0 (self-hosted hardware you own) |
| DALL-E (1000 images/month) | ~$40/month | $0 (self-hosted image gen) |
| Whisper API (10 hours audio/month) | ~$6/month | $0 (self-hosted transcription) |
| OpenAI embeddings (1M tokens/month) | ~$0.10/month | $0 (self-hosted embeddings) |
| Total | ~$60+/month | $0/month self-hosted |
After hardware investment, every self-hosted request is free forever. No rate limits, no usage caps, no surprise bills.
The self-hosted router scores each device on 7 signals and picks the best one for every request. Multiple self-hosted machines share the load automatically.
# Self-hosted fleet overview
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
# Self-hosted health checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
# Self-hosted model recommendations for your hardware
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool
Self-hosted dashboard at http://localhost:11435/dashboard for visual monitoring of your entire self-hosted fleet.
Ollama Herd is open source (MIT). Self-hosted AI for everyone:
CLAUDE.md gives full self-hosted context. 444 tests.~/.fleet-manager/.