Install
openclaw skills install qwen-qwen3-5Qwen 3.5 by Alibaba — run Qwen 3.5 (the latest and most capable Qwen model) across your local device fleet. Qwen 3.5 rivals GPT-4o and Claude 3.5 on reasoning benchmarks. Plus Qwen3-Coder for code generation and Qwen3-ASR for speech-to-text. Fleet-routed to the best available machine via Ollama Herd. Zero cloud costs.
openclaw skills install qwen-qwen3-5Qwen 3.5 is the newest and most capable model in the Qwen family. It rivals GPT-4o and Claude 3.5 Sonnet on reasoning, coding, and multilingual benchmarks — and you can run it locally on your own hardware for free.
| Model | Parameters | Ollama name | Best for |
|---|---|---|---|
| Qwen 3.5 | 72B | qwen3.5 | Frontier reasoning — rivals GPT-4o |
| Qwen 3.5 | 32B | qwen3.5:32b | Strong quality at lower resource cost |
| Qwen 3.5 | 14B | qwen3.5:14b | Good balance for mid-range hardware |
| Qwen 3.5 | 7B | qwen3.5:7b | Fast on low-RAM devices |
| Qwen3-Coder | 32B | qwen3-coder:32b | Code generation — 80+ languages |
| Qwen2.5-Coder | 7B, 32B | qwen2.5-coder:32b | Proven code model |
| Qwen3-ASR | — | qwen3-asr | Speech-to-text transcription |
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
herd # start the router (port 11435)
herd-node # run on each device — finds the router automatically
No models are downloaded during installation. Models are pulled on demand. All pulls require user confirmation.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
# Qwen 3.5 for complex reasoning
response = client.chat.completions.create(
model="qwen3.5",
messages=[{"role": "user", "content": "Compare microservices vs monolith architectures"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
response = client.chat.completions.create(
model="qwen3-coder:32b",
messages=[{"role": "user", "content": "Write a thread-safe connection pool in Go"}],
)
print(response.choices[0].message.content)
# Qwen 3.5 chat
curl http://localhost:11435/api/chat -d '{
"model": "qwen3.5",
"messages": [{"role": "user", "content": "Explain attention mechanisms"}],
"stream": false
}'
curl http://localhost:11435/api/transcribe \
-F "file=@meeting.wav" \
-F "model=qwen3-asr"
Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.
| Device | RAM | Best Qwen model |
|---|---|---|
| Mac Mini (16GB) | 16GB | qwen3.5:7b |
| Mac Mini (32GB) | 32GB | qwen3.5:14b or qwen2.5-coder:32b |
| MacBook Pro (64GB) | 64GB | qwen3.5:32b or qwen3-coder:32b |
| Mac Studio (128GB) | 128GB | qwen3.5 (72B) — full quality |
| Mac Studio (256GB) | 256GB | qwen3.5 + qwen3-coder:32b simultaneously |
Llama 3.3, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Gemma 3, Codestral — same endpoint.
curl -o image.png http://localhost:11435/api/generate-image \
-d '{"model": "z-image-turbo", "prompt": "an AI assistant helping with code", "width": 1024, "height": 1024}'
curl http://localhost:11435/api/embed \
-d '{"model": "nomic-embed-text", "input": "Qwen 3.5 large language model"}'
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
Dashboard at http://localhost:11435/dashboard.
Ollama Herd is open source (MIT):
CLAUDE.md gives AI agents full context. 444 tests, async Python.~/.fleet-manager/.