Rtx Local Ai

v1.0.0

RTX Local AI — turn your gaming PC into a local AI server. RTX 4090, RTX 4080, RTX 4070, RTX 3090 run Llama, Qwen, DeepSeek, Phi, Mistral locally. Gaming PC...

0· 105·2 current·2 all-time
byTwin Geeks@twinsgeeks

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for twinsgeeks/rtx-local-ai.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Rtx Local Ai" (twinsgeeks/rtx-local-ai) from ClawHub.
Skill page: https://clawhub.ai/twinsgeeks/rtx-local-ai
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install rtx-local-ai

ClawHub CLI

Package manager switcher

npx clawhub@latest install rtx-local-ai
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description describe turning an RTX gaming PC into a local AI server via Ollama Herd. The SKILL.md tells the user to pip install 'ollama-herd' and run 'herd'/'herd-node', references local endpoints (localhost:11435), optional tools (python3, pip, nvidia-smi), and config paths under ~/.fleet-manager — all are consistent with that purpose.
Instruction Scope
Instructions are narrowly scoped to installing and operating Ollama Herd, querying local endpoints, and configuring environment variables or systemd to keep models resident. Two things to note: (1) herd-node auto-discovers the router via mDNS, which opens network discovery and can expose the service to other LAN hosts; (2) the doc asks users to edit systemd (sudo) and set persistent environment variables — these are expected for persistent model hosting but require elevated privileges and care. There is no attempt to read unrelated files or exfiltrate secrets in the provided instructions.
Install Mechanism
The skill is instruction-only (no install spec). It tells users to run 'pip install ollama-herd' (PyPI). That is expected for this functionality but does mean you will install third-party code from the public package index — moderate risk if the package or its version is unvetted. The skill itself doesn't bundle code or download arbitrary archives.
Credentials
The skill declares no required credentials and the environment variables it references (OLLAMA_KEEP_ALIVE, OLLAMA_MAX_LOADED_MODELS) are directly related to the runtime behavior described. The config paths (~/.fleet-manager/...) are consistent with a fleet manager and are not system-wide secrets.
Persistence & Privilege
The skill does not request 'always: true' or elevated persistent privileges on its own. However, following its instructions (editing systemd, setting persistent environment variables, enabling mDNS-based discovery) requires administrator privileges and will make the service persist and potentially accessible on the local network. That is coherent with the stated goal but increases exposure.
Assessment
This skill appears to be what it says: guidance for running Ollama Herd on RTX GPUs. Before proceeding: (1) Inspect the referenced project and the exact PyPI package/version (https://pypi.org/project/ollama-herd/ and the GitHub repo) to ensure you trust the publisher. (2) Prefer installing into a virtualenv or container rather than system Python. (3) Be aware herd-node uses mDNS auto-discovery — only enable it on trusted LANs, or firewall/bind the service to localhost if you don't want other devices to access it. (4) Editing systemd requires sudo; back up service files and understand the change. (5) Confirm model downloads are truly opt-in and that large models won't be pulled automatically. If you need a higher-assurance review, provide the PyPI package contents or the GitHub repo code for deeper analysis.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

joystick Clawdis
OSLinux · Windows
Any bincurl, wget
latestvk977gzdng0wbj2cyzmymvex5zx8448hh
105downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0
Linux, Windows

RTX Local AI — Your Gaming PC Is an AI Server

Your RTX GPU already runs games at 4K. Now run LLMs at the same speed. An RTX 4090 with 24GB vRAM loads 70B parameter models. An RTX 4080 with 16GB runs 14B-34B models fast. Stack multiple RTX PCs into a fleet and route AI requests to the best available RTX GPU.

RTX GPU model guide

RTX GPUvRAMBest RTX modelsRTX performance
RTX 409024GBllama3.3:70b (Q4), qwen3.5:32b, deepseek-r1:32bRTX king — 70B models at speed
RTX 408016GBqwen3.5:14b, phi4, codestral, mistral-nemoRTX sweet spot for most tasks
RTX 4070 Ti12GBphi4, gemma3:12b, llama3.2:3bBudget RTX with solid performance
RTX 407012GBphi4-mini, gemma3:4b, qwen3.5:7bEntry-level RTX for local AI
RTX 309024GBSame as RTX 4090Last-gen RTX, still great for AI
RTX 308010GBphi4-mini, llama3.2:3bOlder RTX, lightweight models

Cross-platform: RTX Local AI works on Windows and Linux. Most RTX gaming PCs run Windows — that's fine.

Setup your RTX AI server

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/

Single RTX gaming PC

herd         # start the RTX router
herd-node    # register this RTX machine

Multiple RTX PCs (RTX fleet)

On one RTX PC (the router):

herd
herd-node

On every other RTX PC:

herd-node    # auto-discovers the RTX router via mDNS

That's it. Every RTX PC in your fleet now shares AI workload.

Use your RTX for AI

OpenAI SDK

from openai import OpenAI

# Your RTX GPU serves this
rtx_client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# RTX 4090 handles 70B models easily
response = rtx_client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Write a game engine ECS system in Rust"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

RTX-powered code generation

# Your RTX runs Codestral for code
response = rtx_client.chat.completions.create(
    model="codestral",
    messages=[{"role": "user", "content": "Optimize this HLSL shader for RTX ray tracing"}],
)
print(response.choices[0].message.content)

curl

# RTX inference
curl http://localhost:11435/api/chat -d '{
  "model": "qwen3.5:32b",
  "messages": [{"role": "user", "content": "Explain GPU memory architecture"}],
  "stream": false
}'

RTX vs cloud — cost comparison

OptionMonthly costRTX advantage
RTX 4090 (one-time $1,599)$0/monthYour RTX runs unlimited inference forever
Cloud A100 (AWS)$3.06/hour (~$2,200/month)RTX pays for itself in weeks
OpenAI GPT-4o API~$100-500/month at scaleRTX has zero per-token cost
RTX 4080 (one-time $1,199)$0/monthEven budget RTX beats cloud

Monitor your RTX fleet

# RTX fleet overview
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Check RTX GPU health
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Models loaded on RTX GPUs
curl -s http://localhost:11435/api/ps | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — live RTX performance monitoring.

Optimize Ollama for RTX

Keep models loaded in your RTX vRAM permanently:

# Windows (most RTX gaming PCs)
[System.Environment]::SetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "-1", "User")
[System.Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "-1", "User")
# Restart Ollama from system tray
# Linux
sudo systemctl edit ollama
# Add: Environment="OLLAMA_KEEP_ALIVE=-1"
# Add: Environment="OLLAMA_MAX_LOADED_MODELS=-1"
sudo systemctl restart ollama

Also available on your RTX fleet

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "RTX-powered cyberpunk cityscape", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "NVIDIA RTX local AI inference"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). RTX gamers and AI builders welcome:

Guardrails

  • RTX model downloads require explicit user confirmation — models range from 1GB to 400GB+.
  • RTX model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Comments

Loading comments...