Install
openclaw skills install ollama-skillUse when user wants to integrate Ollama into coding agents, IDEs, or agent harnesses. Supports local/on-prem/Docker deployment, Ollama Cloud, OpenAI/Anthropi...
openclaw skills install ollama-skillIntegrate Ollama into coding agents, IDEs, and agent harnesses with minimal code changes. Supports local/on-prem/Docker deployment, Ollama Cloud, and OpenAI/Anthropic-compatible endpoints. Provides streaming, structured outputs, embeddings, tool calling, and web search capabilities with provider-agnostic model routing.
| Field | Value |
|---|---|
| Identifier | ollama-skill |
| Version | 1.0.0 |
| Author | Wai Yan Nyein Naing |
| Category | ai-ml |
| Installs | 0 |
| Rating | 0 / 5 (0 ratings) |
| License | MIT |
GitHub: WaiYanNyeinNaing/ollama-skill — ⭐ 0 | Forks: 0
Ollama Runtime helps AI coding agents integrate Ollama into applications, coding assistants, IDE plugins, and agent harnesses with minimal code changes. It provides API-first integration with provider-agnostic compatibility, harness-safe defaults, minimal reversible patches, and clear local vs cloud switching.
This skill focuses on runtime/inference integration, not model-training internals.
/api/chat for copilots, app assistants, and iterative agent loopsSKILL.md inside the installed directory| Deployment | API Base URL | Auth Required |
|---|---|---|
| Local | http://localhost:11434/api | No |
| Docker | http://localhost:11434/api | No |
| Direct Cloud | https://ollama.com/api | Yes (API key) |
http://localhost:11434/v1/http://localhost:11434stream=True / stream: true is set# Local / on-prem / Docker
export OLLAMA_HOST=http://localhost:11434
export OLLAMA_API_BASE=http://localhost:11434/api
export OLLAMA_MODEL=qwen3-coder
export OLLAMA_EMBED_MODEL=embeddinggemma
# Direct cloud API (optional)
export OLLAMA_API_KEY=your_api_key
import requests
import os
OLLAMA_API_BASE = os.getenv("OLLAMA_API_BASE", "http://localhost:11434/api")
MODEL = os.getenv("OLLAMA_MODEL", "qwen3-coder")
response = requests.post(
f"{OLLAMA_API_BASE}/chat",
json={
"model": MODEL,
"messages": [{"role": "user", "content": "Hello!"}],
"stream": False
}
)
print(response.json()["message"]["content"])
# Clone and setup
git clone https://github.com/WaiYanNyeinNaing/ollama-skill.git
cd ollama-skill
cp .env.example .env
pip install -r requirements.txt
# Run example
python examples/python_native_chat.py
Choose local/on-prem when:
Choose cloud when:
Choose compatibility mode when:
When integrating Ollama into an app or coding agent, follow this order:
Use /api/chat for chat-style interactions and coding agents.
import requests
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "qwen3-coder",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a string."}
],
"stream": False
}
)
result = response.json()
print(result["message"]["content"])
Use streaming for chat UIs, terminals, coding copilots, and long generations.
import requests
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "qwen3-coder",
"messages": [{"role": "user", "content": "Write a long story..."}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
chunk = line.decode('utf-8')
# Parse NDJSON chunks
Rules:
content, thinking, or tool_callsUse structured outputs when the next system component requires machine-readable JSON.
import requests
schema = {
"type": "object",
"properties": {
"function_name": {"type": "string"},
"parameters": {"type": "array", "items": {"type": "string"}},
"return_type": {"type": "string"}
},
"required": ["function_name", "parameters", "return_type"]
}
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "qwen3-coder",
"messages": [{"role": "user", "content": "Analyze this function..."}],
"stream": False,
"format": schema
}
)
Rules:
stream: falseformat: "json" for plain JSONformat when shape mattersUse embeddings for semantic search, retrieval, RAG, and nearest-neighbor lookup.
import requests
response = requests.post(
"http://localhost:11434/api/embed",
json={
"model": "embeddinggemma",
"input": "The quick brown fox jumps over the lazy dog."
}
)
embeddings = response.json()["embeddings"]
Use tool calling when the model must invoke bounded external functions.
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "qwen3-coder",
"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
"tools": tools,
"stream": False
}
)
Rules:
Use web search only when the answer depends on recent external information.
Rules:
Prefer native Ollama SDK/API when:
Prefer when:
openai SDK/v1/chat/completionsfrom openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1/",
api_key="ollama" # Required but ignored locally
)
response = client.chat.completions.create(
model="qwen3-coder",
messages=[{"role": "user", "content": "Hello!"}]
)
Prefer when:
anthropic SDKfrom anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:11434",
api_key="ollama" # Required but ignored locally
)
response = client.messages.create(
model="qwen3-coder",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
Use Docker for repeatable local or server deployment.
CPU baseline:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
With Docker Compose:
docker compose up -d
If GPU support is needed, follow NVIDIA Container Toolkit setup before launching GPU-enabled containers.
Always:
Common statuses to handle:
| Status | Meaning |
|---|---|
| 400 | Bad request |
| 404 | Model not found |
| 429 | Rate limit |
| 500 | Internal error |
| 502 | Upstream/cloud reachability issues |
Best for internal tools, local copilots, private workflows.
Best bridge pattern when the user wants local tools with larger hosted models.
Best for hosted backends or when the app should treat Ollama as a remote provider.
Best when the codebase already depends on OpenAI or Anthropic SDKs.
For agent systems, default to:
/api/chat unless an existing provider SDK already dominates the codebasestream=false for planner/executor boundariesRecommended adapter boundary:
send_chat()embed_texts()invoke_tools()search_web()healthcheck()| Example | Description |
|---|---|
python_native_chat.py | Basic chat via native Ollama API |
python_streaming_chat.py | Streaming responses |
python_embeddings.py | Generate embeddings |
python_structured.py | JSON structured output |
python_openai_compat.py | OpenAI SDK compatibility |
python_anthropic_compat.py | Anthropic SDK compatibility |
python_cloud_direct.py | Direct cloud API usage |
javascript_chat.mjs | Node.js chat example |
javascript_structured.mjs | Node.js structured output |
curl_chat.sh | Shell/curl example |
When applying this skill, produce:
Do not:
The task is done when:
Copyright (c) 2026 Dr.Wai Yan Nyein Naing
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.