Mac Studio AI. Mac Studio本地AI. Mac Studio IA Local.

v1.0.2

Mac Studio AI — run LLMs, image generation, speech-to-text, and embeddings on your Mac Studio. M2 Ultra (192GB), M3 Ultra (512GB), M4 Max (128GB), and M4 Ult...

2· 40·0 current·0 all-time
byTwin Geeks@twinsgeeks
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (local Mac Studio LLMs, image gen, STT, embeddings) match the SKILL.md: it instructs installing 'ollama-herd', starting 'herd' and 'herd-node', and calling a local HTTP API on port 11435 for chat, image-gen, transcribe, and embed endpoints. Required bins (curl/wget, optional python/pip) are appropriate for the documented commands.
Instruction Scope
Instructions are focused on installing and running a local fleet manager and calling its HTTP endpoints. However, the doc recommends running a local service ('herd') that listens on port 11435, advertises automatic discovery on the LAN, and shows an example with api_key="not-needed" — this suggests the default instructions may start an unauthenticated service reachable on the local network. Also the SKILL.md tells the user to 'pip install ollama-herd' and run commands that will fetch and execute third-party code; that is expected for this purpose but is a security-relevant action the user should review before running.
Install Mechanism
The skill itself has no install spec (instruction-only). The SKILL.md references 'pip install ollama-herd' and 'uv tool install ...', which are common package installs from public package managers. There are no direct download URLs or obscure install hosts in the SKILL.md. Because the skill doesn't include an install script, nothing from the skill bundle would be written to disk by installing the skill itself.
Credentials
The skill declares no required environment variables or credentials. The metadata lists expected binaries (curl/wget, optional python/pip) and two config paths under ~/.fleet-manager, which are proportional to a fleet manager's operation. There are no requests for unrelated cloud credentials or broad secrets.
Persistence & Privilege
always is false and there is no install spec; the skill does not request persistent platform-level privileges. It does instruct running a persistent local service (herd) on port 11435 — this is normal for the described functionality but is an operational persistence the user must manage.
Assessment
This instruction-only skill is coherent with its stated purpose, but before following the SKILL.md do these checks: 1) Review the upstream project (https://github.com/geeks-accelerator/ollama-herd) and the pip package (ownership, recent releases, release artifacts) so you know what code you'll install. 2) Be cautious about running the herd router as shown — examples suggest no API key and automatic LAN discovery, which could expose the service to other devices on your network; enable authentication, bind to localhost, or use a firewall if you do not want LAN access. 3) Install in a virtualenv or isolated VM/container if you want to limit blast radius. 4) Inspect or audit ~/.fleet-manager logs and DB (they may contain activity or metadata). 5) If you intend to expose this service externally, add TLS and authorization — do not rely on the example's api_key="not-needed" in production. These precautions will reduce risk while using the tool as intended.

Like a lobster shell, security has layers — review code before you run it.

120bvk975v2d51r0qa3x4f2sfceenm184024n256gbvk975v2d51r0qa3x4f2sfceenm184024napple-siliconvk975v2d51r0qa3x4f2sfceenm184024nimage-generationvk975v2d51r0qa3x4f2sfceenm184024nlatestvk975v2d51r0qa3x4f2sfceenm184024nllmvk975v2d51r0qa3x4f2sfceenm184024nlocal-aivk975v2d51r0qa3x4f2sfceenm184024nm2-ultravk975v2d51r0qa3x4f2sfceenm184024nm3-ultravk975v2d51r0qa3x4f2sfceenm184024nm4-maxvk975v2d51r0qa3x4f2sfceenm184024nm4-ultravk975v2d51r0qa3x4f2sfceenm184024nmac-studiovk975v2d51r0qa3x4f2sfceenm184024nollamavk975v2d51r0qa3x4f2sfceenm184024nspeech-to-textvk975v2d51r0qa3x4f2sfceenm184024nunified-memoryvk975v2d51r0qa3x4f2sfceenm184024n

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

desktop Clawdis
OSmacOS
Any bincurl, wget

SKILL.md

Mac Studio AI — The Most Powerful Local AI Machine

The Mac Studio is the best hardware for local AI. Mac Studio M4 Ultra with 256GB of unified memory runs 120B+ parameter models. Mac Studio M3 Ultra with 512GB loads frontier models that need 4-8 NVIDIA A100s elsewhere. The Mac Studio runs everything in one memory pool — no PCIe bottleneck.

One Mac Studio is a powerhouse. Multiple Mac Studios become a fleet.

Mac Studio configurations for AI

Mac Studio ConfigChipMemoryGPU CoresMac Studio LLM Sweet Spot
Mac Studio M4 MaxM4 Max128GB4070B models on Mac Studio
Mac Studio M4 UltraM4 Ultra256GB80120B+ models on Mac Studio
Mac Studio M3 UltraM3 Ultra192-512GB76236B models on Mac Studio
Mac Studio M2 UltraM2 Ultra192GB7670B-120B on Mac Studio

Setup your Mac Studio

pip install ollama-herd    # install on your Mac Studio
herd                       # start Mac Studio as the router (port 11435)
herd-node                  # connect additional Mac Studios or other devices

Mac Studios discover each other automatically on your local network.

Add Mac Studio image generation

uv tool install mflux           # Flux models (~5s at 512px on Mac Studio M4 Ultra)
uv tool install diffusionkit    # Stable Diffusion 3/3.5 on Mac Studio

Use your Mac Studio for AI inference

Mac Studio LLM inference — run the biggest models

from openai import OpenAI

# Connect to Mac Studio running Ollama Herd
mac_studio = OpenAI(base_url="http://mac-studio:11435/v1", api_key="not-needed")

# 120B model — runs smoothly on Mac Studio M4 Ultra (256GB unified memory)
response = mac_studio.chat.completions.create(
    model="gpt-oss:120b",  # loaded entirely in Mac Studio unified memory
    messages=[{"role": "user", "content": "How does Mac Studio handle large AI models?"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Mac Studio image generation

# Flux via mflux — ~5s on Mac Studio M4 Ultra
curl -o mac_studio_art.png http://mac-studio:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model": "z-image-turbo", "prompt": "a Mac Studio on a minimalist desk with holographic AI display", "width": 1024, "height": 1024}'

# Stable Diffusion 3 on Mac Studio — ~9s
curl -o mac_studio_sd3.png http://mac-studio:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model": "sd3-medium", "prompt": "Mac Studio M4 Ultra rendering AI art", "width": 1024, "height": 1024, "steps": 20}'

Mac Studio speech-to-text

# Transcribe on Mac Studio via Qwen3-ASR
curl http://mac-studio:11435/api/transcribe \
  -F "file=@mac_studio_meeting.wav" \
  -F "model=qwen3-asr"

Mac Studio embeddings

# Generate embeddings on Mac Studio
curl http://mac-studio:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Mac Studio M4 Ultra unified memory AI inference"}'

Recommended models for Mac Studio

Mac Studio ConfigModels for this Mac Studio
Mac Studio M4 Max (128GB)llama3.3:70b, qwen3:72b, deepseek-r1:70b, codestral
Mac Studio M4 Ultra (256GB)gpt-oss:120b, qwen3:110b, two 70B models simultaneously
Mac Studio M3 Ultra (512GB)deepseek-v3:236b (quantized), multiple 70B models at once

Ask the Mac Studio for recommendations: GET http://mac-studio:11435/dashboard/api/recommendations

Multiple Mac Studios as a fleet

Mac Studio #1 (M4 Ultra, 256GB)  ─┐
Mac Studio #2 (M4 Max, 128GB)    ├──→  Mac Studio Router (:11435)  ←──  Your apps
Mac Mini (32GB)                   ─┘

The Mac Studio router scores each device on 7 signals. Big models route to the Mac Studio with the most memory.

Monitor your Mac Studio

Mac Studio dashboard at http://mac-studio:11435/dashboard — models loaded on each Mac Studio, queue depths, thermal state, memory.

# Mac Studio fleet status
curl -s http://mac-studio:11435/fleet/status | python3 -m json.tool

# Mac Studio health checks
curl -s http://mac-studio:11435/dashboard/api/health | python3 -m json.tool

Example Mac Studio fleet status response:

{
  "fleet": {"nodes_online": 2, "nodes_total": 2},
  "nodes": [
    {"node_id": "Mac-Studio-Ultra", "memory": {"total_gb": 256, "used_gb": 120}},
    {"node_id": "Mac-Studio-Max", "memory": {"total_gb": 128, "used_gb": 85}}
  ]
}

Full documentation

Contribute

Ollama Herd is open source (MIT). Built by Mac Studio owners for Mac Studio owners:

  • Star on GitHub — help other Mac Studio users find us
  • Open an issue — share your Mac Studio AI setup
  • PRs welcomeCLAUDE.md gives AI agents full context. 412 tests, async Python.

Guardrails

  • No automatic downloads — Mac Studio model pulls require explicit user confirmation.
  • Model deletion requires explicit user confirmation.
  • All Mac Studio requests stay local — no data leaves your network.
  • Never delete or modify files in ~/.fleet-manager/.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…