Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Self-Hosted AI — Self-Hosted LLM, Image Gen, STT Replacing Cloud APIs

Self-hosted AI — run your own LLM inference, image generation, speech-to-text, and embeddings. No cloud APIs, no SaaS subscriptions, no data leaving your net...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 14 · 0 current installs · 0 all-time installs
byTwin Geeks@twinsgeeks
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name and description describe a self-hosted router for LLMs/images/STT; the SKILL.md shows commands (local HTTP endpoints, pip install ollama-herd, uv tool install, curl) that are consistent with that purpose. However the metadata claims config paths (~/.fleet-manager/latency.db and logs/herd.jsonl) despite the README stating "No config files", which is an internal inconsistency worth noting.
!
Instruction Scope
The instructions tell users to pip install a PyPI package and run herd/herd-node which will create local services and open network endpoints and perform auto-discovery across the LAN. They reference local config/log paths in metadata. The SKILL.md also uses 'uv tool install' for model backends (mflux/diffusionkit) — that command is not explained and likely performs network downloads of models/artifacts. The doc claims "No automatic downloads" yet shows commands that install packages and model backends; this contradiction increases risk because the agent/user could be guided to fetch large binary models from external sources.
Install Mechanism
This is instruction-only (no install spec). The instructions rely on pip (PyPI) and an unexplained 'uv tool install' CLI to fetch model backends. Pip/uv installs are expected for this functionality, but because there's no install spec or verified release host provided in the skill metadata, a user should inspect the referenced PyPI package and the uv tool's behavior before running them. No direct arbitrary URL downloads are embedded in the SKILL.md, but model tool installs imply significant downloads.
Credentials
The skill declares no required environment variables or credentials, which is appropriate. Metadata references configPaths (latency.db, logs/herd.jsonl) — reading/writing those local files is plausible for a fleet manager, but the SKILL.md earlier claimed "No config files", creating a proportionality mismatch to verify. Network access to local hosts/ports is necessary and reasonable for the described functionality.
Persistence & Privilege
The skill is not always-enabled and does not request special platform privileges. It is instruction-only and would only persist state if the user runs the recommended installers (pip, herd) which is normal for self-hosted tooling. Autonomous invocation is allowed by default (not raised by itself) but combined with the other concerns it is worth being cautious.
What to consider before installing
This skill is plausibly what it claims (a local router for self-hosted models) but there are a few red flags you should check before installing: - Inspect the PyPI package (ollama-herd) source on GitHub before running pip install. Confirm the package code and its network/file operations. - Verify what 'uv tool install' actually does (where it downloads models from, whether downloads are interactive, and what executable it adds). The SKILL.md's claim "No automatic downloads" conflicts with these install commands. - Expect services to open local HTTP endpoints and use LAN auto-discovery; run these in an isolated network segment if you have sensitive data or want to limit exposure. - The metadata lists config/log files (~/.fleet-manager/latency.db, logs/herd.jsonl) even though the doc said "No config files" — assume the system will persist logs/metadata locally and confirm retention/rotation policies. - Review the GitHub repo and issues, check release provenance (signed releases, trusted maintainers), and prefer manual, audited model downloads rather than one-click installs. If you can't review the package sources and the uv tool behavior, treat this as higher risk and avoid running the installers on production or sensitive hosts.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
apple-siliconvk972sr399evkk8287162ed76en83yjepdata-sovereigntyvk972sr399evkk8287162ed76en83yjepgdprvk972sr399evkk8287162ed76en83yjephipaavk972sr399evkk8287162ed76en83yjeplatestvk972sr399evkk8287162ed76en83yjeplocal-aivk972sr399evkk8287162ed76en83yjepmac-studiovk972sr399evkk8287162ed76en83yjepno-cloudvk972sr399evkk8287162ed76en83yjepollamavk972sr399evkk8287162ed76en83yjepon-premisevk972sr399evkk8287162ed76en83yjepopenai-alternativevk972sr399evkk8287162ed76en83yjepprivatevk972sr399evkk8287162ed76en83yjepself-hostedvk972sr399evkk8287162ed76en83yjepself-hosted-aivk972sr399evkk8287162ed76en83yjepself-hosted-llmvk972sr399evkk8287162ed76en83yjep

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

server Clawdis
OSmacOS · Linux
Any bincurl, wget

SKILL.md

Self-Hosted AI — Own Your Entire AI Stack

Stop paying per token. Stop sending data to cloud APIs. Run LLMs, image generation, speech-to-text, and embeddings on your own hardware. One router makes all your devices act like one system.

What you're replacing

Cloud serviceSelf-hosted replacementHow
OpenAI APILlama 3.3, Qwen 3.5, DeepSeek-R1 via OllamaSame OpenAI SDK, swap the base URL
DALL-E / MidjourneyStable Diffusion 3, Flux via mflux/DiffusionKitPOST /api/generate-image
Whisper APIQwen3-ASR via MLXPOST /api/transcribe
OpenAI Embeddingsnomic-embed-text, mxbai-embed via OllamaPOST /api/embed

Same APIs. Same quality. Zero per-request costs. All data stays on your machines.

Setup

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/
herd                       # start the router
herd-node                  # run on each machine — auto-discovers the router

No Docker. No Kubernetes. No config files. Devices find each other automatically on your local network.

Self-hosted LLM inference

Drop-in replacement for the OpenAI SDK:

from openai import OpenAI

# Before: client = OpenAI(api_key="sk-...")
# After:
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.3:70b",  # or qwen3:32b, deepseek-r1:70b, etc.
    messages=[{"role": "user", "content": "Analyze this contract for risks"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Ollama API

curl http://localhost:11435/api/chat -d '{
  "model": "deepseek-r1:70b",
  "messages": [{"role": "user", "content": "Explain this code: ..."}],
  "stream": false
}'

Self-hosted image generation

Replace DALL-E and Midjourney:

# Install image backends on any node
uv tool install mflux           # Flux models (~7s)
uv tool install diffusionkit    # Stable Diffusion 3/3.5

# Generate
curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model": "z-image-turbo", "prompt": "product mockup on white background", "width": 1024, "height": 1024}'

Self-hosted speech-to-text

Replace Whisper API:

curl http://localhost:11435/api/transcribe \
  -F "file=@meeting-recording.wav" \
  -F "model=qwen3-asr"

Self-hosted embeddings

Replace OpenAI's embedding API:

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "your document text here"}'

Cost comparison

ServiceCloud costSelf-hosted cost
GPT-4o (1M tokens/month)~$15-30/month$0 (hardware you already own)
DALL-E (1000 images/month)~$40/month$0
Whisper API (10 hours audio/month)~$6/month$0
OpenAI embeddings (1M tokens/month)~$0.10/month$0
Total~$60+/month$0/month

After hardware investment, every request is free forever. No rate limits, no usage caps, no surprise bills.

Self-hosted advantages

  • Data sovereignty — prompts, images, audio, and documents never leave your network
  • No rate limits — your hardware, your throughput
  • No downtime dependency — cloud API outages don't affect you
  • No vendor lock-in — switch models instantly, no migration
  • Compliance-friendly — HIPAA, GDPR, SOC2 — no third-party data processors
  • Predictable costs — hardware depreciates, but never surprises you with a bill

Fleet routing

The router scores each device on 7 signals and picks the best one for every request. Multiple machines share the load automatically.

# Fleet overview
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Health checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Model recommendations for your hardware
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard for visual monitoring.

Full documentation

Contribute

Ollama Herd is open source (MIT). Self-hosted AI for everyone:

  • Star on GitHub — help others escape cloud API lock-in
  • Open an issue — share your self-hosted setup
  • PRs welcome from humans and AI agents. CLAUDE.md gives full context. 412 tests.

Guardrails

  • No automatic downloads — all model pulls require explicit user confirmation.
  • Model deletion requires explicit user confirmation.
  • All requests stay local — no data leaves your network. No telemetry, no analytics, no cloud callbacks.
  • Never delete or modify files in ~/.fleet-manager/.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…