Install
openclaw skills install windows-aiWindows AI — run local AI on Windows with LLM inference, image generation, and embeddings. Windows AI server for Llama, Qwen, DeepSeek, Phi, Mistral. Turn Windows PCs into a Windows AI cluster. No cloud APIs, no subscriptions — Windows AI runs entirely on your hardware. Windows AI本地推理。Windows IA local sin dependencias en la nube.
openclaw skills install windows-aiRun AI entirely on Windows. No cloud APIs, no subscriptions, no data leaving your network. Windows AI via Ollama Herd routes LLM requests across your Windows machines — your gaming PC, your work desktop, your laptop. One Windows AI endpoint serves them all.
# Install Windows AI router
pip install ollama-herd
# Start Windows AI on your main PC
herd # Windows AI router on port 11435
herd-node # register this Windows AI node
# On other Windows PCs
herd-node # joins the Windows AI cluster automatically
Windows Firewall: Allow port 11435 —
netsh advfirewall firewall add rule name="Windows AI" dir=in action=allow protocol=tcp localport=11435
from openai import OpenAI
# Your Windows AI endpoint
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
# Windows AI routes to the best available GPU
response = client.chat.completions.create(
model="qwen3.5:32b",
messages=[{"role": "user", "content": "Explain local AI vs cloud AI for Windows users"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
# Windows AI code generation
response = client.chat.completions.create(
model="codestral",
messages=[{"role": "user", "content": "Write a C# Windows service that monitors GPU temperature"}],
)
print(response.choices[0].message.content)
# Windows AI chat
curl http://localhost:11435/api/chat -d '{
"model": "llama3.3:70b",
"messages": [{"role": "user", "content": "Hello from Windows AI"}],
"stream": false
}'
| Windows PC | GPU | RAM | Best Windows AI models |
|---|---|---|---|
| Gaming desktop | RTX 4090 (24GB) | 32GB+ | llama3.3:70b, qwen3.5:32b — full quality Windows AI |
| Gaming desktop | RTX 4080 (16GB) | 16GB+ | phi4, codestral, qwen3.5:14b |
| Work laptop | RTX 4060 (8GB) | 16GB | phi4-mini, gemma3:4b — fast Windows AI |
| Office desktop | Intel/AMD (no GPU) | 16GB | phi4-mini, gemma3:1b — CPU Windows AI |
Windows AI works with or without a GPU. NVIDIA GPUs dramatically accelerate inference.
# Optimize Windows AI performance
[System.Environment]::SetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "-1", "User")
[System.Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "-1", "User")
# Restart Ollama from the Windows system tray
http://localhost:11435/dashboardWorks with any OpenAI-compatible tool on Windows:
http://localhost:11435/v1curl http://localhost:11435/api/generate-image `
-d '{"model": "z-image-turbo", "prompt": "futuristic Windows desktop", "width": 1024, "height": 1024}'
curl http://localhost:11435/api/embed `
-d '{"model": "nomic-embed-text", "input": "Windows AI local inference embeddings"}'
Ollama Herd is open source (MIT). Windows AI enthusiasts welcome:
~/.fleet-manager/.