Install
openclaw skills install deepseek-deepseek-coderDeepSeek DeepSeek-Coder — run DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder across your local fleet. 7-signal scoring routes every request to the best device. Cross-platform (macOS, Linux, Windows). Zero cloud costs via Ollama Herd.
openclaw skills install deepseek-deepseek-coderRun DeepSeek-V3, DeepSeek-R1, and DeepSeek-Coder on your own hardware. The fleet router picks the best device for every request — no cloud API needed, zero per-token costs, all data stays on your machines.
| Model | Parameters | Ollama name | Best for |
|---|---|---|---|
| DeepSeek-V3 | 671B MoE (37B active) | deepseek-v3 | General — matches GPT-4o on most benchmarks |
| DeepSeek-V3.1 | 671B MoE | deepseek-v3.1 | Hybrid thinking/non-thinking modes |
| DeepSeek-V3.2 | 671B MoE | deepseek-v3.2 | Improved reasoning + agent performance |
| DeepSeek-R1 | 1.5B–671B | deepseek-r1 | Reasoning — approaches O3 and Gemini 2.5 Pro |
| DeepSeek-Coder | 1.3B–33B | deepseek-coder | Code generation (87% code, 13% NL training) |
| DeepSeek-Coder-V2 | 236B MoE (21B active) | deepseek-coder-v2 | Code — matches GPT-4 Turbo on code tasks |
pip install ollama-herd
herd # start the router (port 11435)
herd-node # run on each machine
Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd
Models are pulled on demand — the router auto-pulls when a request arrives for a model not yet on any node, or you can pull manually via the dashboard. No models are downloaded during installation.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
# DeepSeek-R1 for reasoning
response = client.chat.completions.create(
model="deepseek-r1:70b",
messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
response = client.chat.completions.create(
model="deepseek-coder-v2:16b",
messages=[{"role": "user", "content": "Write a Redis cache decorator in Python"}],
)
print(response.choices[0].message.content)
# DeepSeek-V3 general chat
curl http://localhost:11435/api/chat -d '{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"stream": false
}'
# DeepSeek-R1 reasoning
curl http://localhost:11435/api/chat -d '{
"model": "deepseek-r1:70b",
"messages": [{"role": "user", "content": "Solve this step by step: ..."}],
"stream": false
}'
Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.
DeepSeek offers models at every size. Pick the one that fits your available memory — smaller models work great for most tasks:
| Model | Min RAM | Recommended hardware |
|---|---|---|
deepseek-r1:1.5b | 4GB | Any Mac |
deepseek-r1:7b | 8GB | Mac Mini M4 (16GB) |
deepseek-r1:14b | 12GB | Mac Mini M4 (24GB) |
deepseek-r1:32b | 24GB | Mac Mini M4 Pro (48GB) |
deepseek-r1:70b | 48GB | Mac Studio M4 Max (128GB) |
deepseek-coder-v2:16b | 12GB | Mac Mini M4 (24GB) |
deepseek-v3 | 256GB+ | Mac Studio M3 Ultra (512GB) |
The fleet router automatically sends requests to the machine where the model is loaded — no manual routing needed.
num_ctx changesLlama 3.3, Qwen 3.5, Phi 4, Mistral, Gemma 3 — any Ollama model routes through the same endpoint.
curl -o image.png http://localhost:11435/api/generate-image \
-H "Content-Type: application/json" \
-d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'
curl http://localhost:11435/api/transcribe -F "audio=@recording.wav"
curl http://localhost:11435/api/embeddings -d '{"model":"nomic-embed-text","prompt":"query"}'
http://localhost:11435/dashboard — monitor DeepSeek requests alongside all other models. Per-model latency, token throughput, health checks.
~/.fleet-manager/.deepseek-r1:7b instead of :70b).auto_pull setting.