Gpu Cluster Manager

v1.4.1

Turn your spare GPUs into one inference endpoint. Auto-discovers machines on your network, routes requests to the best available device, learns when your mac...

⭐ 0· 215·2 current·2 all-time

byTwin Geeks@twinsgeeks

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for twinsgeeks/gpu-cluster-manager.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Gpu Cluster Manager" (twinsgeeks/gpu-cluster-manager) from ClawHub.
Skill page: https://clawhub.ai/twinsgeeks/gpu-cluster-manager
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install gpu-cluster-manager

ClawHub CLI

Package manager switcher

npx clawhub@latest install gpu-cluster-manager

Security Scan

VirusTotal

stale

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The skill's name and instructions (install ollama-herd, run herd/herd-node, provide a local OpenAI-compatible endpoint) match the stated purpose. However, the registry metadata declares no requirements while SKILL.md includes its own metadata requiring network tools (curl/wget), optional python/pip, and config paths (~/.fleet-manager/...). That mismatch is incoherent and should be explained by the publisher.

Instruction Scope

Runtime instructions tell the user to pip install a PyPI package and run herd/herd-node; the service auto-discovers via mDNS, auto-pulls models to nodes, and implements 'meeting detection' that pauses inference when camera/mic are active on macOS. These behaviors imply access to network, disk, and system sensors (camera/microphone). SKILL.md does not detail where pulled models come from, what permissions are required/used for meeting detection, or any safeguards for auto-downloads or sensitive sensor access.

ℹ

Install Mechanism

The skill is instruction-only (no install spec in registry) but tells users to run `pip install ollama-herd` (PyPI). Installing a third-party pip package is a moderate-risk install mechanism because install-time code can execute arbitrary actions. SKILL.md's declared homepage (a GitHub repo) exists in the file but the registry entry lists no homepage — another inconsistency to resolve.

Credentials

No environment variables or external credentials are requested, which is proportionate. However, requested config paths (~/.fleet-manager/latency.db, logs/herd.jsonl) indicate the skill will write local telemetry. The meeting-detection feature implies access to camera/mic state on macOS, which is sensitive; auto-pull will download large models over the network and write them to disk. These resource and privacy implications are not made explicit in the registry metadata.

✓

Persistence & Privilege

The skill is not set to always:true and does not request system-wide config changes in the registry. It will create local files under the user's home (~/.fleet-manager) and requires installing a pip package, which is typical. Autonomous invocation is allowed by default (expected) but combined with network/model-download and sensor access increases blast radius—worth caution but not an immediate privilege misconfiguration in the manifest.

What to consider before installing

Before installing: verify the upstream project and PyPI package (check the GitHub repo, package maintainers, and release history); inspect the package source if possible (pip install can run arbitrary code during install). Be aware the tool auto-downloads models (large disk and network use) and can access system sensors (macOS camera/mic) for meeting-detection—decide whether you want those capabilities. If you proceed, run the package in an isolated environment (VM/container) or on a machine you can sacrifice disk/network access from, restrict network egress if you need to control where models are downloaded from, and review/grep the installed package files for unexpected behavior. Ask the publisher to resolve the registry vs SKILL.md inconsistencies (declared requirements, homepage) and to document where models are pulled from and what permissions meeting-detection uses.

Like a lobster shell, security has layers — review code before you run it.

latestvk970kmh837b52jc0t62d4dcg5h8457qh

215downloads

0stars

7versions

Updated 3w ago

v1.4.1

MIT-0

GPU Cluster Manager

You are managing a GPU cluster that combines multiple machines into one inference endpoint for running local LLMs via Ollama. The GPU cluster routes every request to the best available device automatically.

What this GPU cluster solves

Your desktop, laptop, and maybe an old Linux box all have GPUs sitting idle most of the time. You want one GPU cluster URL that uses all of them — without Kubernetes, without Docker, without editing config files. Just point your AI apps at the GPU cluster endpoint and let the cluster figure out which machine should handle each request.

This GPU cluster manager does exactly that. Install it, run two commands, and your GPU cluster machines discover each other automatically. The GPU cluster learns when your devices are free, pauses during video calls, and picks the best GPU cluster node for every request based on real-time conditions.

Getting started with the GPU cluster

pip install ollama-herd    # GPU cluster manager from PyPI

On your main GPU cluster machine (the router):

herd    # starts GPU cluster router

On each other GPU cluster machine:

herd-node    # joins the GPU cluster automatically

That's it. The GPU cluster nodes find the router via mDNS. No config files. Your GPU cluster is running.

If mDNS doesn't work on your GPU cluster network: herd-node --router-url http://router-ip:11435

GPU Cluster Endpoint

Your GPU cluster runs at http://localhost:11435. Point any AI app at the GPU cluster:

from openai import OpenAI
# GPU cluster client
gpu_cluster_client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
gpu_cluster_response = gpu_cluster_client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Explain GPU cluster routing for AI inference"}]
)

Works with: LangChain, CrewAI, AutoGen, LlamaIndex, Aider, Cline, Continue.dev, and any OpenAI-compatible client pointing at the GPU cluster.

GPU Cluster Smart Features

GPU cluster auto-discovery — machines find each other via mDNS, no config
7-signal GPU cluster scoring — picks the best machine based on loaded models, memory, queue depth, latency, and more
GPU cluster meeting detection — pauses inference when your camera/mic is active (macOS)
GPU cluster capacity learning — learns your weekly patterns (168-hour behavioral model)
GPU cluster context protection — prevents models from reloading when apps send different context sizes
GPU cluster auto-pull — if you request a model that doesn't exist, it downloads to the best GPU cluster node
GPU cluster auto-retry — if a machine hiccups, retries on the next-best GPU cluster node

Check your GPU cluster

GPU cluster status — all machines

curl -s http://localhost:11435/fleet/status | python3 -m json.tool

What models are available on the GPU cluster?

curl -s http://localhost:11435/api/tags | python3 -m json.tool

What's loaded in GPU cluster memory right now?

curl -s http://localhost:11435/api/ps | python3 -m json.tool

How healthy is the GPU cluster?

curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

GPU cluster model recommendations

curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Returns GPU cluster recommendations based on your hardware — which models fit, which are too big, and the optimal GPU cluster mix.

GPU cluster recent activity

curl -s "http://localhost:11435/dashboard/api/traces?limit=10" | python3 -m json.tool

GPU cluster usage stats

curl -s http://localhost:11435/dashboard/api/usage | python3 -m json.tool

GPU cluster settings

curl -s http://localhost:11435/dashboard/api/settings | python3 -m json.tool

curl -s -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"auto_pull": false}'

Manage GPU cluster models

# What's on each GPU cluster node
curl -s http://localhost:11435/dashboard/api/model-management | python3 -m json.tool

# Download a model to a specific GPU cluster node
curl -s -X POST http://localhost:11435/dashboard/api/pull \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.3:70b", "node_id": "gpu-cluster-studio"}'

# Remove a model from a GPU cluster node
curl -s -X POST http://localhost:11435/dashboard/api/delete \
  -H "Content-Type: application/json" \
  -d '{"model": "old-model:7b", "node_id": "gpu-cluster-studio"}'

GPU cluster per-app tracking

curl -s http://localhost:11435/dashboard/api/apps | python3 -m json.tool

Tag your GPU cluster requests to see which apps use the most time:

curl -s http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Summarize GPU cluster utilization"}],"metadata":{"tags":["gpu-cluster-app"]}}'

GPU Cluster Dashboard

Open http://localhost:11435/dashboard for a visual GPU cluster overview. Eight tabs: Fleet Overview (live GPU cluster node cards), Trends (charts), Model Insights (performance comparison), Apps (per-app usage), Benchmarks, Health (automated GPU cluster checks), Recommendations (what models to run), Settings.

Try the GPU cluster

# Quick GPU cluster test
curl -s http://localhost:11435/api/chat \
  -d '{"model":"llama3.2:3b","messages":[{"role":"user","content":"Hello from the GPU cluster!"}],"stream":false}'

GPU Cluster Troubleshooting

Check what's slow in the GPU cluster

sqlite3 ~/.fleet-manager/latency.db "SELECT model, node_id, AVG(latency_ms)/1000.0 as avg_secs, COUNT(*) as n FROM request_traces WHERE status='completed' GROUP BY node_id, model HAVING n > 5 ORDER BY avg_secs DESC LIMIT 10"

See GPU cluster failures

sqlite3 ~/.fleet-manager/latency.db "SELECT request_id, model, status, error_message, latency_ms/1000.0 as secs FROM request_traces WHERE status='failed' ORDER BY timestamp DESC LIMIT 10"

GPU Cluster Guardrails

Never restart or stop the GPU cluster without explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/ (contains all your GPU cluster data and logs).
Do not pull or delete models on the GPU cluster without user confirmation — downloads can be 10-100+ GB.
If a GPU cluster machine shows as offline, report it rather than attempting to SSH into it.

GPU Cluster Failure Handling

Connection refused → GPU cluster router may not be running, suggest herd or uv run herd
0 nodes online → suggest starting herd-node on GPU cluster devices
mDNS discovery fails → use --router-url http://router-ip:11435
GPU cluster requests hang → check for num_ctx in client requests; context protection handles it
GPU cluster errors → check ~/.fleet-manager/logs/herd.jsonl

Comments

Loading comments...