Apple Silicon AI. Apple Silicon本地AI. Apple Silicon IA.

v1.0.2

Apple Silicon AI — run LLMs, image generation, speech-to-text, and embeddings on Mac Studio, Mac Mini, MacBook Pro, and Mac Pro. Turn your Apple Silicon devi...

2· 41·0 current·0 all-time
byTwin Geeks@twinsgeeks
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description advertise a local Apple Silicon inference fleet and the SKILL.md only asks for tools and commands that match that purpose (pip install ollama-herd, herd, herd-node, curl examples). Metadata about required binaries (curl/wget) and optional python/pip aligns with the commands shown.
Instruction Scope
Instructions are focused on installing a fleet manager and starting a router/node agents that auto-discover on the LAN and expose HTTP APIs (port 11435). They do not ask the agent to read unrelated system files or secrets, but they do instruct running network services that will publish hardware and model info on the local network and create local logs/config under ~/.fleet-manager.
Install Mechanism
This is an instruction-only skill (no install spec). However, it explicitly tells users to run `pip install ollama-herd` — meaning installation will fetch third‑party code from PyPI/GitHub. The skill itself doesn't provide or pin the package, so verifying the package origin and integrity is the user's responsibility.
Credentials
No environment variables or unrelated credentials are requested. Declared config paths (~/.fleet-manager/latency.db, ~/.fleet-manager/logs/herd.jsonl) are consistent with a local fleet manager and with the stated functionality.
Persistence & Privilege
The skill does not request elevated privileges nor set always:true. It does instruct the user to run daemons (herd, herd-node) that persist and open local network endpoints; that increases exposure on the LAN and should be considered before enabling.
Assessment
This skill appears internally consistent with its goal of running a local Apple Silicon inference fleet. Before installing or running it: 1) Inspect the upstream project (https://github.com/geeks-accelerator/ollama-herd) and the PyPI package (if present) to confirm origin and review recent commits/maintainers; 2) Install into a virtualenv or dedicated machine, not as root; 3) Consider network isolation or firewall rules because the router/node auto-discover on the LAN and open port 11435 (limit exposure to your trusted network); 4) Review and monitor files under ~/.fleet-manager for sensitive logs; 5) Prefer pinned package versions and verify checksums when possible. If you cannot audit the package code or do not want a service auto-advertising hardware on your LAN, do not install.

Like a lobster shell, security has layers — review code before you run it.

apple-siliconvk976pnp0t1xk3kd8w3tcdmyken8417wfembeddingsvk976pnp0t1xk3kd8w3tcdmyken8417wfimage-generationvk976pnp0t1xk3kd8w3tcdmyken8417wflatestvk976pnp0t1xk3kd8w3tcdmyken8417wfllmvk976pnp0t1xk3kd8w3tcdmyken8417wflocal-aivk976pnp0t1xk3kd8w3tcdmyken8417wfm2-ultravk9775v8nqt2gvs93zev5vpt40h83yx0bm3-maxvk9775v8nqt2gvs93zev5vpt40h83yx0bm4-maxvk976pnp0t1xk3kd8w3tcdmyken8417wfm4-ultravk976pnp0t1xk3kd8w3tcdmyken8417wfmac-minivk976pnp0t1xk3kd8w3tcdmyken8417wfmac-provk976pnp0t1xk3kd8w3tcdmyken8417wfmac-studiovk976pnp0t1xk3kd8w3tcdmyken8417wfmacbook-provk976pnp0t1xk3kd8w3tcdmyken8417wfollamavk976pnp0t1xk3kd8w3tcdmyken8417wfself-hostedvk976pnp0t1xk3kd8w3tcdmyken8417wfspeech-to-textvk976pnp0t1xk3kd8w3tcdmyken8417wf

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

apple Clawdis
OSmacOS
Any bincurl, wget

SKILL.md

Apple Silicon AI — Your Macs Are the Cluster

Turn your Mac Studio, Mac Mini, MacBook Pro, or Mac Pro into a local Apple Silicon AI fleet. One endpoint routes LLM inference, image generation, speech-to-text, and embeddings across every Apple Silicon device on your network.

No cloud APIs. No GPU rentals. No Docker. Your Apple Silicon M1/M2/M3/M4 chips with unified memory are already better inference hardware than most cloud instances — you just need software that treats them as an Apple Silicon fleet.

Why Apple Silicon for AI

Apple Silicon unified memory keeps the entire model in one address space — no PCIe bottleneck, no CPU-GPU transfer overhead. A Mac Studio with M4 Ultra and 256GB runs 120B parameter models that would need multiple NVIDIA A100s. That is the Apple Silicon advantage.

Apple Silicon ChipUnified MemoryLLM Sweet SpotApple Silicon Image GenNotes
M1 (8GB)8GB7B modelsSlowEntry-level Apple Silicon
M1 Pro/Max (32-64GB)32-64GB14B-32BCapableApple Silicon MacBook Pro
M2 Ultra (192GB)192GB70B-120BFastApple Silicon Mac Studio/Pro
M3 Max (128GB)128GB70BFastLatest Apple Silicon MacBook Pro
M4 Max (128GB)128GB70BFastApple Silicon Mac Studio, newest gen
M4 Ultra (256GB)256GB120B+Very fastApple Silicon Mac Studio/Pro, largest models

Apple Silicon Fleet Setup

1. Install on every Apple Silicon Mac

pip install ollama-herd    # Apple Silicon optimized inference router

2. Start the Apple Silicon router (pick one Mac)

herd    # starts Apple Silicon router on port 11435

3. Start the Apple Silicon node agent on every Mac

herd-node    # Apple Silicon node auto-discovers the router

That's it. Apple Silicon nodes discover the router automatically on your local network. No IP addresses to configure, no config files. For explicit connection, use herd-node --router-url http://<router-ip>:11435.

How Apple Silicon routing works

MacBook Pro (M3 Max, 64GB)  ─┐
Mac Mini (M4, 32GB)          ├──→  Apple Silicon Router (:11435)  ←──  Your apps
Mac Studio (M4 Ultra, 256GB) ─┘

The Apple Silicon router scores each device on 7 signals and routes every request to the best available Mac — thermal state, memory fit, queue depth, and more.

Apple Silicon LLM Inference

Run Llama, Qwen, DeepSeek, Phi, Mistral, Gemma, and any Ollama model across your Apple Silicon fleet.

OpenAI-compatible API (Apple Silicon backend)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3:70b",
    "messages": [{"role": "user", "content": "Explain Apple Silicon unified memory architecture"}]
  }'

Ollama-compatible API

curl http://localhost:11435/api/chat \
  -d '{"model": "qwen3:32b", "messages": [{"role": "user", "content": "Compare Apple Silicon M4 vs M3 for AI inference"}]}'

Apple Silicon Python Client

from openai import OpenAI
# Apple Silicon inference client
apple_silicon_client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")
apple_silicon_response = apple_silicon_client.chat.completions.create(
    model="deepseek-r1:70b",
    messages=[{"role": "user", "content": "Optimize this function for Apple Silicon"}]
)

Apple Silicon Image Generation (mflux)

Generate images using MLX-native Flux models. Runs natively on Apple Silicon — no CUDA, no cloud.

curl http://localhost:11435/api/generate-image \
  -d '{"prompt": "Apple Silicon Mac Studio rendering AI art, photorealistic", "model": "z-image-turbo", "width": 512, "height": 512}'

Apple Silicon image generation performance:

  • Mac Studio M4 Ultra: ~5s at 512px, ~14s at 1024px
  • MacBook Pro M3 Max: ~7s at 512px, ~18s at 1024px
  • Mac Mini M4: ~12s at 512px, ~30s at 1024px

Apple Silicon Speech-to-Text (Qwen ASR)

Transcribe audio locally on Apple Silicon using Qwen3-ASR via MLX. Meetings, voice notes, podcasts — no cloud, no Whisper API costs.

curl http://localhost:11435/api/transcribe \
  -F "file=@apple_silicon_meeting.wav" \
  -F "model=qwen3-asr"

Supports WAV, MP3, M4A, FLAC. ~2s for a 30-second clip on Apple Silicon M4 Ultra.

Apple Silicon Embeddings

Embed documents across your Apple Silicon fleet using Ollama embedding models (nomic-embed-text, mxbai-embed-large, snowflake-arctic-embed).

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Apple Silicon unified memory architecture for AI inference"}'

Batch thousands of documents across Apple Silicon nodes instead of bottlenecking on one Mac.

Apple Silicon Fleet Monitoring

Dashboard

Open http://localhost:11435/dashboard — see every Apple Silicon Mac in your fleet: models loaded, queue depth, thermal state, memory usage, and health status.

Apple Silicon Fleet Status API

curl http://localhost:11435/fleet/status

Returns every Apple Silicon node with hardware specs, loaded models, image/STT capabilities, and health metrics.

Apple Silicon Health Checks

curl http://localhost:11435/dashboard/api/health

11 automated checks: offline Apple Silicon nodes, memory pressure, thermal throttling, VRAM fallbacks, error rates, and more.

Recommended Models by Apple Silicon Hardware

Your Apple Silicon MacRAMRecommended models
Mac Mini (16GB)16GBllama3.2:3b, phi4-mini, nomic-embed-text
Mac Mini (32GB)32GBqwen3:14b, deepseek-r1:14b, llama3.3:8b
MacBook Pro (36-64GB)36-64GBqwen3:32b, deepseek-r1:32b, codestral
Mac Studio (128GB)128GBllama3.3:70b, qwen3:72b, deepseek-r1:70b
Mac Studio/Pro (192-256GB)192-256GBqwen3:110b, deepseek-v3:236b (quantized)

The Apple Silicon router's model recommender analyzes your fleet hardware and suggests the optimal model mix: GET /dashboard/api/model-recommendations.

Full documentation

Guardrails

  • No automatic downloads: Apple Silicon model pulls are always user-initiated and require explicit confirmation. Downloads range from 2GB to 70GB+ depending on model size.
  • Model deletion requires confirmation: Never remove models from Apple Silicon nodes without explicit user approval.
  • All Apple Silicon requests stay local: No data leaves your local network — all inference happens on your Apple Silicon Macs.
  • No API keys: No accounts, no tokens, no cloud dependencies for your Apple Silicon fleet.
  • No external network access: The Apple Silicon router and nodes communicate only on your local network. No telemetry, no cloud callbacks.
  • Read-only local state: The only local files created are ~/.fleet-manager/latency.db (Apple Silicon routing metrics) and ~/.fleet-manager/logs/herd.jsonl (structured logs). Never delete or modify these files without user confirmation.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…