Install
openclaw skills install windows-ollamaWindows Ollama — run Ollama on Windows with fleet routing across multiple Windows PCs. Windows Ollama setup for Llama, Qwen, DeepSeek, Phi, Mistral. Route Ollama inference across Windows machines with NVIDIA RTX GPUs. Windows Ollama load balancing, health monitoring, and real-time dashboard. Windows Ollama本地推理。Windows Ollama enrutador IA.
openclaw skills install windows-ollamaRun Ollama on Windows with multi-machine load balancing. Windows Ollama Herd turns multiple Windows PCs running Ollama into one smart endpoint. Your gaming desktop, your work laptop, your old tower — all serving AI requests through one Windows Ollama URL.
Download Ollama from ollama.ai and install. Ollama on Windows runs natively with NVIDIA GPU support.
pip install ollama-herd
On one Windows PC (your router):
herd # starts Windows Ollama router on port 11435
herd-node # registers this Windows PC
On every other Windows PC:
herd-node # auto-discovers the Windows Ollama router
mDNS issues on Windows? Corporate networks often block mDNS. Use explicit connection:
herd-node --router-url http://router-ip:11435
curl http://localhost:11435/fleet/status
You should see all your Windows Ollama nodes listed.
from openai import OpenAI
# Your Windows Ollama fleet
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
model="llama3.3:70b",
messages=[{"role": "user", "content": "Write a PowerShell script to monitor GPU usage"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
curl http://localhost:11435/api/chat -d '{
"model": "qwen3.5:32b",
"messages": [{"role": "user", "content": "Explain Windows GPU drivers"}],
"stream": false
}'
curl http://localhost:11435/v1/chat/completions `
-H "Content-Type: application/json" `
-d '{"model": "phi4", "messages": [{"role": "user", "content": "Hello from Windows"}]}'
Keep models loaded in GPU memory on Windows:
# Windows environment variables (PowerShell)
[System.Environment]::SetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "-1", "User")
[System.Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "-1", "User")
[System.Environment]::SetEnvironmentVariable("OLLAMA_NUM_PARALLEL", "2", "User")
# Restart Ollama from the Windows system tray
Verify Windows Ollama settings:
[System.Environment]::GetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "User")
| Windows PC | GPU | Best Windows Ollama models |
|---|---|---|
| Gaming desktop (RTX 4090) | 24GB vRAM | llama3.3:70b, qwen3.5:32b, deepseek-r1:32b |
| Gaming desktop (RTX 4080) | 16GB vRAM | qwen3.5:14b, phi4, codestral |
| Work laptop (RTX 4060) | 8GB vRAM | phi4-mini, gemma3:4b, llama3.2:3b |
| Office desktop (no GPU) | CPU only | phi4-mini, gemma3:1b — slower but works |
Windows Ollama works with or without an NVIDIA GPU. CPU inference is slower but functional.
Allow Ollama Herd through Windows Firewall:
netsh advfirewall firewall add rule name="Ollama Herd" dir=in action=allow protocol=tcp localport=11435
netstat -ano | findstr :11435 # Windows Ollama router
netstat -ano | findstr :11434 # Ollama itself
# Check Windows Ollama fleet health
curl http://localhost:11435/dashboard/api/health | python3 -m json.tool
# Windows Ollama fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
# Models on Windows Ollama nodes
curl -s http://localhost:11435/api/ps | python3 -m json.tool
# Windows Ollama health checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
Dashboard at http://localhost:11435/dashboard — live Windows Ollama monitoring.
curl http://localhost:11435/api/generate-image `
-d '{"model": "z-image-turbo", "prompt": "Windows desktop wallpaper", "width": 1024, "height": 1024}'
curl http://localhost:11435/api/embed `
-d '{"model": "nomic-embed-text", "input": "Windows Ollama local inference"}'
Ollama Herd is open source (MIT). Windows Ollama users welcome:
~/.fleet-manager/.