{"skill":{"slug":"cuda-ollama","displayName":"Cuda Ollama","summary":"CUDA Ollama — route Ollama LLM inference across NVIDIA GPUs with automatic CUDA load balancing. CUDA Ollama cluster for RTX 4090, RTX 4080, A100, L40S, H100....","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":117,"installsAllTime":2,"installsCurrent":2,"stars":0,"versions":1},"createdAt":1775256818688,"updatedAt":1775256834459},"latestVersion":{"version":"1.0.0","createdAt":1775256818688,"changelog":"Initial release of CUDA Ollama fleet router.\n\n- Route Ollama LLM inference across multiple NVIDIA CUDA GPUs with automatic load balancing.\n- Supports GPU fleets including RTX 4090, 4080, 4070, 3090, A100, L40S, H100 on Linux and Windows.\n- Features 7-signal scoring, vRAM-aware fallback, and CUDA auto-retry for robust routing.\n- Provides cluster health monitoring, web dashboard, and OpenAI-compatible API endpoints.\n- Manual control for model downloads and deletions for safety.","license":"MIT-0"},"metadata":{"os":["linux","windows"],"systems":null},"owner":{"handle":"twinsgeeks","userId":"s17dgy27g44azc3tday4qh394d83ensj","displayName":"Twin Geeks","image":"https://avatars.githubusercontent.com/u/261838102?v=4"},"moderation":null}