Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Ollama — Herd Your LLMs Into One Smart Endpoint

v1.0.0

Ollama fleet router — herd your Ollama LLMs into one smart endpoint. Route Llama, Qwen, DeepSeek, Phi, Mistral, and Gemma across multiple devices with 7-sign...

2· 25·0 current·0 all-time
byTwin Geeks@twinsgeeks
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description (Ollama fleet router) match the SKILL.md: it tells you to pip install a package, run a router and per-node agent, and route local Ollama instances. Minor mismatch: registry top-level requirements list only curl/wget while the runtime instructions rely on pip/python and the commands 'herd'/'herd-node' (the SKILL metadata lists python3/pip/sqlite3 as optional bins). Requiring a PyPI package and local agents is coherent with the stated purpose, but the dependency on Python/pip is not enforced in the manifest.
Instruction Scope
Instructions remain within the router’s scope (start router, call local endpoints, enable features via dashboard endpoints). They also describe auto-pull (automatic model downloads) and reference config paths (~/.fleet-manager/*). The guardrails state not to modify ~/.fleet-manager without user confirmation. Nothing in SKILL.md instructs reading unrelated system files or exfiltrating secrets, but auto-pull will download large model files and the router will access local model state and logs — which is expected but impactful.
!
Install Mechanism
No install spec in the manifest, but the runtime instructions require 'pip install ollama-herd' from PyPI. Installing a third‑party PyPI package can execute arbitrary code on the host. That is expected for a Python-based router, but it's a medium-risk install action and the skill does not declare an automated, vetted install; the agent or user would run pip at their discretion.
Credentials
The skill declares no credentials and only needs common networking tools (curl/wget) and optionally python/pip/sqlite3. The listed configPaths (~/.fleet-manager/latency.db and logs) are appropriate for a router that tracks latency and logs. No unrelated secrets or external service tokens are requested.
Persistence & Privilege
always:false and no special persistence or modification of other skills is requested. The guardrails explicitly say not to restart or modify the router/node agents or ~/.fleet-manager without confirmation. Autonomous invocation is allowed (default) but not combined with any elevated privileges in the manifest.
What to consider before installing
This skill appears to be a legitimate local Ollama fleet router, but it asks you to pip install a third‑party package and will automatically download models to nodes (auto-pull). Before installing or running: 1) verify the PyPI package and GitHub repo (check publisher, recent commits, issues). 2) Be prepared for large model downloads and disk/VRAM usage; confirm you want auto-pull enabled. 3) Ensure you trust the package source because pip install runs arbitrary code. 4) Note the small manifest mismatch: the runtime needs python/pip and the herd/herd-node binaries, which the registry metadata only lists as optional — make sure those are present. If you need higher assurance, inspect the package source code on the repo or install in an isolated environment first.

Like a lobster shell, security has layers — review code before you run it.

apple-siliconvk97agtcj0qam4epb0jvcdne88n83x76kdeepseekvk97agtcj0qam4epb0jvcdne88n83x76kfleetvk97agtcj0qam4epb0jvcdne88n83x76kgemmavk97agtcj0qam4epb0jvcdne88n83x76kinferencevk97agtcj0qam4epb0jvcdne88n83x76klatestvk97agtcj0qam4epb0jvcdne88n83x76kllamavk97agtcj0qam4epb0jvcdne88n83x76kllmvk97agtcj0qam4epb0jvcdne88n83x76kload-balancervk97agtcj0qam4epb0jvcdne88n83x76kmistralvk97agtcj0qam4epb0jvcdne88n83x76kmultimodalvk97agtcj0qam4epb0jvcdne88n83x76kollamavk97agtcj0qam4epb0jvcdne88n83x76kphivk97agtcj0qam4epb0jvcdne88n83x76kqwenvk97agtcj0qam4epb0jvcdne88n83x76kroutingvk97agtcj0qam4epb0jvcdne88n83x76k

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

llama Clawdis
OSmacOS · Linux
Any bincurl, wget

SKILL.md

Ollama — Herd Your LLMs Into One Endpoint

You have Ollama running on multiple machines. This skill gives you one endpoint that routes every request to the best available device automatically. No more hardcoding IPs, no more manual load balancing, no more "which machine has that model loaded?"

Setup

pip install ollama-herd
herd              # start the router on port 11435
herd-node         # run on each machine with Ollama

Now point everything at http://localhost:11435 instead of http://localhost:11434. Same Ollama API, same models, smarter routing.

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Use your Ollama models through the fleet

OpenAI SDK (drop-in)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Ollama API (same as before, different port)

# Chat
curl http://localhost:11435/api/chat -d '{
  "model": "qwen3:235b",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}'

# List all models across all machines
curl http://localhost:11435/api/tags

# Models currently in GPU memory
curl http://localhost:11435/api/ps

# Embeddings
curl http://localhost:11435/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "search query"
}'

What the router does

When a request comes in, the router scores every online node on 7 signals:

  1. Thermal — is the model already loaded in GPU memory? (+50 for hot)
  2. Memory fit — how much headroom does the node have?
  3. Queue depth — how many requests are waiting?
  4. Wait time — estimated latency based on history
  5. Role affinity — large models prefer big machines
  6. Availability — is the node reliably available?
  7. Context fit — does the loaded context window fit the request?

The highest-scoring node handles the request. If it fails, the router retries on the next best node automatically.

Supported Ollama models

Any model that runs on Ollama works through the fleet. Popular ones:

ModelSizesBest for
llama3.38B, 70BGeneral purpose
qwen30.6B–235BMultilingual, reasoning
qwen3.50.8B–397BLatest generation
deepseek-v3671B (37B active)Matches GPT-4o
deepseek-r11.5B–671BReasoning (like o3)
phi414BSmall, fast, capable
mistral7BFast, European languages
gemma31B–27BGoogle's open model
codestral22BCode generation
qwen3-coder30B (3.3B active)Agentic coding
nomic-embed-text137MEmbeddings for RAG

Resilience features

  • Auto-retry — re-routes to next best node on failure (before first chunk)
  • VRAM-aware fallback — routes to a loaded model in the same category instead of cold-loading
  • Context protection — prevents num_ctx from triggering expensive model reloads
  • Zombie reaper — cleans up stuck in-flight requests
  • Auto-pull — downloads missing models to the best node automatically

Also available

The same fleet router handles three more workloads:

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Enable: curl -X POST .../dashboard/api/settings -d '{"image_generation":true}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "audio=@recording.wav"

Enable: curl -X POST .../dashboard/api/settings -d '{"transcription":true}'

Embeddings

curl http://localhost:11435/api/embeddings -d '{"model":"nomic-embed-text","prompt":"text"}'

Already enabled — routes through Ollama automatically.

Dashboard

http://localhost:11435/dashboard — 8 tabs: Fleet Overview, Trends, Model Insights, Apps, Benchmarks, Health, Recommendations, Settings. Real-time queue visibility with [TEXT], [IMAGE], [STT], [EMBED] badges.

Request tagging

Track per-project usage:

response = client.chat.completions.create(
    model="llama3.3:70b",
    messages=messages,
    extra_body={"metadata": {"tags": ["my-project", "reasoning"]}},
)

Full documentation

Agent Setup Guide

Guardrails

  • Never restart the router or node agents without user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • Never pull or delete models without user confirmation.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…