mlx-local-inference
Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 1 · 409 · 1 current installs · 1 all-time installs
by@bendusy
MIT-0
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description (local inference via oMLX and uv) align with the runtime instructions: calls to localhost:8000, uv run invocations, and references to ~/models are consistent with running models locally on macOS.
Instruction Scope
SKILL.md only instructs the agent to call a local HTTP API, run uv to invoke Python model libraries, read model files under ~/models, and use launchctl for the local oMLX service. It does not attempt to read unrelated system files, request unrelated environment variables, or exfiltrate data to remote endpoints.
Install Mechanism
There is no formal install spec in the registry; the SKILL.md recommends installing uv via 'curl -LsSf https://astral.sh/uv/install.sh | sh'. Download-and-exec installer instructions are common for CLIs but are higher risk than package manager installs — verify the source before running. No other installers or remote code downloads are required by the skill itself.
Credentials
The skill requests no environment variables or credentials and only requires the 'uv' binary and an Apple Silicon macOS environment, which is proportionate to local model execution. Model files are referenced under the user's home (~), which is expected.
Persistence & Privilege
The skill is not always-on, does not request elevated platform privileges, and does not modify other skills or system-wide configs beyond invoking a user launchctl command to restart the local oMLX service (which affects only the user's launchd job).
Assessment
This skill appears to do what it says: operate local ML models via a local oMLX gateway and the 'uv' runner. Before installing or following the SKILL.md: 1) Verify the 'uv' installer URL (https://astral.sh) is legitimate to you — avoid running arbitrary curl | sh unless you trust the source; prefer a package manager if available. 2) Confirm you have enough disk space and have placed large model files under ~/models as described. 3) The skill talks to localhost:8000 only; ensure that oMLX is intentionally running and not exposed to untrusted networks. 4) Restarting via launchctl affects only your user service; it requires appropriate user permissions but is not a system-wide change. If you need higher assurance, ask the skill author for a formal install spec or an auditable package (Homebrew formula or repository) and for cryptographic checksums for any model or installer downloads.Like a lobster shell, security has layers — review code before you run it.
Current versionv2.2.1
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
OSmacOS
Any binuv
SKILL.md
MLX Local Inference Stack
Local AI inference on Apple Silicon. oMLX handles LLM/VLM with continuous batching.
Python libraries handle Embedding/ASR/OCR directly via uv.
Architecture
┌─────────────────────────────────────┐
│ oMLX (localhost:8000/v1) │
│ - LLM (Qwen3.5-35B, etc.) │
│ - VLM (vision-language models) │
│ - Continuous batching + SSD cache │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Python Libraries (via uv run) │
│ - mlx-lm: Embedding │
│ - mlx-vlm: OCR (PaddleOCR-VL) │
│ - mlx-audio: ASR (Qwen3-ASR) │
└─────────────────────────────────────┘
Models
| Capability | Implementation | Model | Size |
|---|---|---|---|
| 💬 LLM | oMLX API | Qwen3.5-35B-A3B-4bit | ~20 GB |
| 👁️ VLM | oMLX API | Any mlx-vlm model | varies |
| 📐 Embed | mlx-lm (uv) | Qwen3-Embedding-0.6B-4bit-DWQ | ~1 GB |
| 🎤 ASR | mlx-audio (uv) | Qwen3-ASR-1.7B-8bit | ~1.5 GB |
| 👁️ OCR | mlx-vlm (uv) | PaddleOCR-VL-1.5-6bit | ~3.3 GB |
Usage
LLM / Vision-Language (via oMLX API)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="local")
# Text generation
resp = client.chat.completions.create(
model="Qwen3.5-35B-A3B-4bit",
messages=[{"role": "user", "content": "Hello"}]
)
print(resp.choices[0].message.content)
Embeddings (via mlx-lm + uv)
uv run --with mlx-lm python -c "
from mlx_lm import load
model, tokenizer = load('~/models/Qwen3-Embedding-0.6B-4bit-DWQ')
text = 'text to embed'
inputs = tokenizer(text, return_tensors='np')
embeddings = model(**inputs).last_hidden_state.mean(axis=1)
print(embeddings.shape)
"
ASR — Speech-to-Text (via mlx-audio + uv)
Important: Must run with
--python 3.11to avoid OpenMP threading issues (SIGSEGV).
uv run --python 3.11 --with mlx-audio python -m mlx_audio.stt.generate \
--model ~/models/Qwen3-ASR-1.7B-8bit \
--audio "audio.wav" \
--output-path /tmp/asr_result \
--format txt \
--language zh \
--verbose
OCR (via mlx-vlm + uv)
Important: The
generatefunction parameter order must be(model, processor, prompt, image).
cat << 'PY_EOF' > run_ocr.py
import os
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
model_path = os.path.expanduser("~/models/PaddleOCR-VL-1.5-6bit")
model, processor = load(model_path)
prompt = apply_chat_template(processor, config=model.config, prompt="OCR:", num_images=1)
output = generate(model, processor, prompt, "document.jpg", max_tokens=512, temp=0.0)
print(output.text)
PY_EOF
uv run --python 3.11 --with mlx-vlm python run_ocr.py
Service Management (oMLX only)
# Check running models
curl http://localhost:8000/v1/models
# Restart oMLX
launchctl kickstart -k gui/$(id -u)/com.omlx-server
Model Storage Strategy
All models stored in ~/models/ using oMLX-compatible structure:
~/models/
├── Qwen3-Embedding-0.6B-4bit-DWQ/
├── Qwen3-ASR-1.7B-8bit/
├── PaddleOCR-VL-1.5-6bit/
└── Qwen3.5-35B-A3B-4bit/
Requirements
- Apple Silicon Mac (M1/M2/M3/M4)
uvinstalled (curl -LsSf https://astral.sh/uv/install.sh | sh)
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
