Wsl2 Local Ai

v1.0.0

WSL2 Local AI — run LLMs on Windows via WSL2 with NVIDIA GPU passthrough. WSL2 AI development with Ollama, CUDA, and Docker. WSL2 Ollama fleet routing for Wi...

0· 15·1 current·1 all-time
byTwin Geeks@twinsgeeks
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (WSL2, Ollama, CUDA, Docker, herd routing) matches the instructions: enabling WSL2, verifying nvidia-smi, installing Ollama, pip-installing ollama-herd, and running herd/herd-node. Optional bins (python3, pip, nvidia-smi, wsl) are relevant. One small omission: the README shows docker usage but 'docker' is not listed in the metadata optional bins.
Instruction Scope
SKILL.md confines actions to installing/starting Ollama and herd, adding environment vars to ~/.bashrc, and querying local endpoints (localhost:11435). It does not instruct reading unrelated host files or exfiltrating secrets. It does recommend running remote install scripts (curl | sh) and pulling docker images — expected for this purpose but higher-risk operations.
Install Mechanism
This is instruction-only (no install spec). Install steps include piping a remote installer (https://ollama.ai/install.sh) to sh, pip install, and docker run. These are plausible for installing Ollama but are higher-risk patterns than purely local installs; the sources referenced (ollama.ai and Docker Hub image ollama/ollama) are consistent with the stated toolchain.
Credentials
No required environment variables or credentials are requested. The skill suggests OLLAMA_* env vars for runtime tuning (local-only) and uses a local API endpoint with no API key needed; this is proportionate to the task.
Persistence & Privilege
always is false and the skill does not request elevated platform privileges. It instructs persisting env vars to ~/.bashrc and creating/using ~/.fleet-manager state files — reasonable for a long-running local service. The skill runs local servers and background processes (herd), which is expected for this functionality.
Assessment
This skill appears to do what it says (set up Ollama + herd in WSL2). Before installing: (1) Inspect any remote install script before running 'curl | sh' (it executes code as you). (2) Expect to pull Docker images and run background services listening on localhost — ensure you trust the sources (ollama.ai, Docker Hub) and that you have the appropriate GPU drivers. (3) The metadata lists python/pip/nvidia-smi/wsl but not docker; if you plan to use the Docker example, install Docker Desktop. (4) Running herd will create ~/.fleet-manager state files — review and back up if needed. If you want lower risk, run these steps in a disposable WSL2 instance or VM and review the referenced GitHub repo and install script first.

Like a lobster shell, security has layers — review code before you run it.

latestvk971gznc1913pbkhswj8raryc9844327

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

penguin Clawdis
OSWindows
Any bincurl, wget

SKILL.md

WSL2 Local AI — Windows Developer LLM Stack

Develop AI apps on Windows with full Linux performance. WSL2 gives you native Linux inside Windows with NVIDIA GPU passthrough — your RTX GPU runs CUDA in WSL2 at near-native speed. Ollama Herd routes AI requests across WSL2 instances and native Windows machines.

Why WSL2 for local AI

  • Full Linux + Windows GPU — WSL2 passes your NVIDIA GPU directly to Linux. CUDA works in WSL2.
  • Docker integration — Docker Desktop on Windows uses WSL2 backend. Containerize your AI workflows.
  • Best of both — VS Code on Windows, Ollama in WSL2, GPU shared between them.
  • Development workflow — write code on Windows, run inference in WSL2, same filesystem.

WSL2 AI setup

Step 1: Enable WSL2 with GPU support

# PowerShell (admin)
wsl --install -d Ubuntu
wsl --set-default-version 2

Verify WSL2 NVIDIA GPU access:

# Inside WSL2
nvidia-smi    # should show your RTX GPU

Step 2: Install Ollama in WSL2

# Inside WSL2
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &

Step 3: Install WSL2 Ollama Herd

# Inside WSL2
pip install ollama-herd
herd          # start WSL2 AI router on port 11435
herd-node     # register WSL2 as a node

Step 4: Access from Windows

Your WSL2 AI endpoint is accessible from Windows at http://localhost:11435 — WSL2 forwards ports automatically.

# From Windows PowerShell
curl http://localhost:11435/api/tags    # see WSL2 AI models

Use WSL2 AI

Python (from Windows or WSL2)

from openai import OpenAI

# Same URL works from Windows and WSL2
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# WSL2 handles the inference via NVIDIA GPU
response = client.chat.completions.create(
    model="qwen3.5:32b",
    messages=[{"role": "user", "content": "Write a Docker Compose file for a Python API"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

VS Code + WSL2 AI

// .vscode/settings.json — Continue.dev configuration
{
  "continue.models": [{
    "title": "WSL2 Local",
    "provider": "openai",
    "model": "codestral",
    "apiBase": "http://localhost:11435/v1",
    "apiKey": "not-needed"
  }]
}

curl from WSL2

# WSL2 inference
curl http://localhost:11435/api/chat -d '{
  "model": "codestral",
  "messages": [{"role": "user", "content": "Refactor this Python function"}],
  "stream": false
}'

WSL2 + Docker AI workflow

Run Ollama in Docker on WSL2 for containerized AI:

# WSL2 Docker + Ollama
docker run -d --gpus all -p 11434:11434 ollama/ollama

# Herd routes between Docker Ollama and native Ollama
pip install ollama-herd
herd &
herd-node

WSL2 AI hardware guide

Windows PCGPUWSL2 AI models
RTX 4090 desktop24GB shared with WSL2llama3.3:70b, qwen3.5:32b
RTX 4080 desktop16GB shared with WSL2phi4, codestral, qwen3.5:14b
RTX 4060 laptop8GB shared with WSL2phi4-mini, gemma3:4b

WSL2 shares GPU memory with Windows. Close GPU-heavy Windows apps for more WSL2 AI vRAM.

WSL2 AI environment

# WSL2 Ollama optimization
export OLLAMA_KEEP_ALIVE=-1
export OLLAMA_MAX_LOADED_MODELS=-1

# Add to ~/.bashrc for persistence in WSL2
echo 'export OLLAMA_KEEP_ALIVE=-1' >> ~/.bashrc
echo 'export OLLAMA_MAX_LOADED_MODELS=-1' >> ~/.bashrc

Monitor WSL2 AI

# WSL2 fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# WSL2 health checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — accessible from both Windows browser and WSL2.

Also available on WSL2 AI

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "developer workspace", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "WSL2 Windows development AI"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). WSL2 developers welcome:

Guardrails

  • WSL2 AI model downloads require explicit user confirmation.
  • WSL2 AI model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…