AI DevOps Toolkit. AI DevOps运维. Herramientas DevOps AI.

v1.2.0

Operational tooling for teams running local LLM infrastructure. Request tracing with full scoring breakdowns, per-application usage analytics via request tag...

0· 86·0 current·0 all-time
byTwin Geeks@twinsgeeks
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, the metadata (curl/sqlite3, optional python3/pip, config paths under ~/.fleet-manager) and SKILL.md all describe local observability for an Ollama Herd router; the requested artifacts (SQLite DB, JSONL logs, local HTTP endpoints) align with that purpose.
Instruction Scope
The SKILL.md instructs the agent to query a local router (http://localhost:11435), read/query a local SQLite DB (~/.fleet-manager/latency.db) and local logs, and optionally install/use the 'ollama-herd' Python package. There are no instructions to access unrelated system files, external endpoints, or to exfiltrate data.
Install Mechanism
The registry contains no install spec (instruction-only). SKILL.md recommends 'pip install ollama-herd' and running herd/herd-node. Installing a PyPI package has normal supply-chain risk (arbitrary code execution during install) but is proportionate to a Python-based tooling package; verify the package/source before installing in privileged environments.
Credentials
The skill declares no required environment variables or credentials. The config paths it references (~/.fleet-manager/latency.db and logs) are directly related to its stated observability function and are reasonable for the task.
Persistence & Privilege
The skill is not always-enabled and does not request elevated or persistent platform privileges. It does not instruct modifying other skills or system-wide agent settings.
Assessment
This skill appears to do what it claims: local observability for an Ollama Herd router using a local SQLite DB and logs. Before installing or running commands: (1) verify the 'ollama-herd' PyPI package and its GitHub repo to ensure it is trustworthy, (2) install it in a virtual environment or isolated host to limit risk from arbitrary package code, (3) ensure the router endpoint (localhost:11435) is not exposed to untrusted networks, and (4) review/backup any ~/.fleet-manager data you care about since the skill reads those files. If you don't run an Ollama Herd router or don't want tooling accessing ~/.fleet-manager, do not install or run this skill.

Like a lobster shell, security has layers — review code before you run it.

ai-infrastructurevk975ydb569gs90npzcttczde5d841z1ranalyticsvk975ydb569gs90npzcttczde5d841z1rdevopsvk975ydb569gs90npzcttczde5d841z1rfleet-routingvk97awtfskb6gsn53vbn0td1cch83w1eylatestvk975ydb569gs90npzcttczde5d841z1rmetricsvk975ydb569gs90npzcttczde5d841z1rmonitoringvk975ydb569gs90npzcttczde5d841z1robservabilityvk975ydb569gs90npzcttczde5d841z1rollamavk97awtfskb6gsn53vbn0td1cch83w1eyoperationsvk975ydb569gs90npzcttczde5d841z1rsrevk975ydb569gs90npzcttczde5d841z1rtracesvk975ydb569gs90npzcttczde5d841z1r

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

AI DevOps Toolkit — Observability for Local AI Fleets

DevOps tooling for running local LLM inference at production quality. This DevOps skill provides the observability, tracing, and health monitoring layer for an Ollama Herd fleet. Every DevOps workflow — from request tracing to capacity planning — runs through a single SQLite-backed observability stack.

DevOps Prerequisites

pip install ollama-herd
herd              # start the DevOps router (exposes all DevOps observability endpoints)
herd-node         # start on each DevOps-monitored node

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

DevOps Scope

This DevOps toolkit assumes you have an Ollama Herd router running at http://localhost:11435 with one or more node agents reporting in. It focuses on the DevOps operational side: are requests succeeding? what's slow? which apps consume the most tokens? are nodes healthy? is capacity adequate?

DevOps Observability Stack

Everything in this DevOps observability layer is backed by SQLite at ~/.fleet-manager/latency.db. No external databases, no time-series infrastructure. Query DevOps traces with standard sqlite3.

~/.fleet-manager/
├── latency.db          # DevOps traces, latency history, usage stats
└── logs/
    └── herd.jsonl      # DevOps structured logs, daily rotation, 30-day retention

DevOps Health Checks

Automated DevOps fleet health analysis

devops_health=$(curl -s http://localhost:11435/dashboard/api/health)
echo "$devops_health" | python3 -m json.tool

Eleven DevOps checks, each returning a severity (info/warning/critical) and recommendation:

DevOps CheckWhat it detects
Offline nodesNodes that stopped sending heartbeats
Degraded nodesNodes reporting errors or high memory pressure
Memory pressureNodes approaching memory limits
Underutilized nodesHealthy nodes not receiving traffic
VRAM fallbacksRequests rerouted to loaded alternatives to avoid cold loads
Version mismatchNodes running different versions than the router
Context protectionnum_ctx values stripped or models upgraded to prevent reloads
Zombie reaperStuck in-flight requests cleaned up
Model thrashingModels loading/unloading frequently (memory contention)
Request timeoutsRequests exceeding expected DevOps latency thresholds
Error ratesElevated failure rates per model or per node

DevOps node-level status

devops_fleet_status=$(curl -s http://localhost:11435/fleet/status)
echo "$devops_fleet_status" | python3 -c "
import sys, json
d = json.load(sys.stdin)
print(f\"DevOps Fleet: {d['fleet']['nodes_online']}/{d['fleet']['nodes_total']} online, {d['fleet']['requests_active']} active requests\")
for n in d['nodes']:
    mem = n.get('memory', {})
    cpu = n.get('cpu', {})
    print(f\"  {n['node_id']:20s} {n['status']:10s} CPU={cpu.get('utilization_pct',0):.0f}% MEM={mem.get('used_gb',0):.0f}/{mem.get('total_gb',0):.0f}GB pressure={mem.get('pressure','?')}\")
"

DevOps Request Tracing

Every DevOps routing decision is recorded with full observability context.

Recent DevOps traces

devops_traces=$(curl -s "http://localhost:11435/dashboard/api/traces?limit=20")
echo "$devops_traces" | python3 -m json.tool

Each DevOps trace includes: request_id, model, original_model (before fallback), node_id, score, scores_breakdown (all 7 signals), status, latency_ms, time_to_first_token_ms, prompt_tokens, completion_tokens, retry_count, fallback_used, tags.

DevOps failure investigation

# Recent DevOps failures with error details
sqlite3 ~/.fleet-manager/latency.db "SELECT request_id, model, node_id, error_message, latency_ms/1000.0 as secs, datetime(timestamp, 'unixepoch', 'localtime') as time FROM request_traces WHERE status='failed' ORDER BY timestamp DESC LIMIT 20"

# DevOps retry frequency — which nodes need attention?
sqlite3 ~/.fleet-manager/latency.db "SELECT node_id, SUM(retry_count) as retries, COUNT(*) as total, ROUND(100.0 * SUM(CASE WHEN status='failed' THEN 1 ELSE 0 END) / COUNT(*), 1) as fail_pct FROM request_traces GROUP BY node_id ORDER BY fail_pct DESC"

# DevOps fallback frequency — which models are unreliable?
sqlite3 ~/.fleet-manager/latency.db "SELECT original_model, model as fell_back_to, COUNT(*) as n FROM request_traces WHERE fallback_used=1 GROUP BY original_model, model ORDER BY n DESC"

DevOps Latency Analysis

# DevOps P50/P75/P99 latency by model
sqlite3 ~/.fleet-manager/latency.db "
WITH ranked AS (
  SELECT model, latency_ms,
    PERCENT_RANK() OVER (PARTITION BY model ORDER BY latency_ms) as pct
  FROM request_traces WHERE status='completed'
)
SELECT model,
  ROUND(MIN(CASE WHEN pct >= 0.5 THEN latency_ms END)/1000.0, 1) as p50_s,
  ROUND(MIN(CASE WHEN pct >= 0.75 THEN latency_ms END)/1000.0, 1) as p75_s,
  ROUND(MIN(CASE WHEN pct >= 0.99 THEN latency_ms END)/1000.0, 1) as p99_s,
  COUNT(*) as n
FROM ranked GROUP BY model HAVING n > 10 ORDER BY p75_s DESC
"

# DevOps time-to-first-token observability (cold load detection)
sqlite3 ~/.fleet-manager/latency.db "SELECT node_id, model, ROUND(AVG(time_to_first_token_ms), 0) as avg_ttft_ms, ROUND(MAX(time_to_first_token_ms), 0) as max_ttft_ms, COUNT(*) as n FROM request_traces WHERE time_to_first_token_ms IS NOT NULL GROUP BY node_id, model HAVING n > 5 ORDER BY avg_ttft_ms DESC"

# DevOps outlier detection — slowest requests
sqlite3 ~/.fleet-manager/latency.db "SELECT request_id, model, node_id, ROUND(latency_ms/1000.0, 1) as secs, prompt_tokens, completion_tokens, retry_count, datetime(timestamp, 'unixepoch', 'localtime') as time FROM request_traces WHERE status='completed' ORDER BY latency_ms DESC LIMIT 10"

DevOps Per-Application Analytics

Tag requests to track DevOps usage per application, team, or environment.

DevOps request tagging

# DevOps tag via request body
curl -s http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Hello"}],"metadata":{"tags":["devops-prod","devops-code-review"]}}'

# DevOps tag via header
curl -s -H "X-Herd-Tags: devops-prod, devops-code-review" \
  http://localhost:11435/v1/chat/completions \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Hello"}]}'

DevOps per-tag dashboards

curl -s http://localhost:11435/dashboard/api/apps | python3 -m json.tool
curl -s http://localhost:11435/dashboard/api/apps/daily | python3 -m json.tool

DevOps token consumption by tag

sqlite3 ~/.fleet-manager/latency.db "SELECT j.value as devops_tag, COUNT(*) as requests, SUM(COALESCE(prompt_tokens,0)) as prompt_tok, SUM(COALESCE(completion_tokens,0)) as completion_tok, SUM(COALESCE(prompt_tokens,0)+COALESCE(completion_tokens,0)) as total_tok FROM request_traces, json_each(tags) j WHERE tags IS NOT NULL GROUP BY j.value ORDER BY total_tok DESC"

DevOps Traffic Patterns

# DevOps requests per hour (find peak load times)
sqlite3 ~/.fleet-manager/latency.db "SELECT CAST((timestamp % 86400) / 3600 AS INTEGER) as hour_utc, COUNT(*) as requests, ROUND(AVG(latency_ms)/1000.0, 1) as avg_secs FROM request_traces GROUP BY hour_utc ORDER BY hour_utc"

# DevOps daily request volume
sqlite3 ~/.fleet-manager/latency.db "SELECT date(timestamp, 'unixepoch') as day, COUNT(*) as requests, SUM(COALESCE(prompt_tokens,0)+COALESCE(completion_tokens,0)) as tokens FROM request_traces GROUP BY day ORDER BY day DESC LIMIT 14"

DevOps Capacity Planning

DevOps model recommendations per node

devops_recommendations=$(curl -s http://localhost:11435/dashboard/api/recommendations)
echo "$devops_recommendations" | python3 -m json.tool

Returns DevOps recommendations based on hardware capabilities, current usage, and curated benchmark data. Use for DevOps capacity planning: which models fit on which machines, and what's the optimal mix.

DevOps usage statistics

curl -s http://localhost:11435/dashboard/api/usage | python3 -m json.tool

DevOps Configuration

# View all DevOps settings
curl -s http://localhost:11435/dashboard/api/settings | python3 -m json.tool

# Toggle DevOps runtime settings
curl -s -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"auto_pull": false}'

DevOps Log Analysis

Structured JSONL logs at ~/.fleet-manager/logs/herd.jsonl — the DevOps log layer:

# Recent DevOps errors
grep '"level": "ERROR"' ~/.fleet-manager/logs/herd.jsonl | tail -10 | python3 -m json.tool

# DevOps context protection events
grep "Context protection" ~/.fleet-manager/logs/herd.jsonl | tail -10

# DevOps stream errors
grep "Stream error" ~/.fleet-manager/logs/herd.jsonl | tail -10

DevOps Dashboard

Web dashboard at http://localhost:11435/dashboard. Key DevOps tabs:

  • Trends — DevOps requests/hour, latency, token throughput over 24h–7d
  • Apps — DevOps per-tag analytics with daily breakdowns
  • Health — automated DevOps health checks with severity and recommendations
  • Model Insights — per-model DevOps latency and throughput comparison

Guardrails

  • Never restart DevOps services without explicit user confirmation.
  • Never delete or modify ~/.fleet-manager/ contents.
  • Do not pull or delete models without user confirmation.
  • Report DevOps issues to the user rather than attempting automated fixes.
  • If the router isn't running, suggest herd or uv run herd.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…