Install
openclaw skills install qwen-nothinkCreate and verify no-thinking variants of local Qwen/Qwen3-series Ollama models. Use when a user asks to disable thinking, hide or remove think-tag output, make /no_think the default, create a non-thinking tag, or convert a local Qwen Ollama model such as qwen3, qwen3.5, qwen3.6, qwen-vl, or custom Qwen MLX/GGUF tags to answer directly without requiring ollama run --think=false every time.
openclaw skills install qwen-nothinkCreate a new Ollama tag that reuses an existing Qwen model's weights but defaults to direct answers. Prefer a reversible derived tag such as qwen3.6:35b-mlx-nothink; do not modify or delete the source model.
This skill exists because prompt-only approaches usually fail for Qwen thinking models in Ollama. The reliable path is to combine a no-thinking chat template with a local manifest/config patch that removes thinking renderer metadata from the derived tag.
Inspect the source model:
ollama show SOURCE_MODEL
ollama show --parameters SOURCE_MODEL
ollama show --template SOURCE_MODEL
Verify the runtime switch works:
ollama run --think=false SOURCE_MODEL "用一句话回答:1+1等于几?"
If this still emits thinking content, stop and report that the local Ollama/model build does not honor the runtime switch.
Create a derived no-thinking tag with the bundled script:
python3 scripts/create_qwen_nothink_ollama.py SOURCE_MODEL --target TARGET_MODEL
Confirm ollama show TARGET_MODEL lists completion and, if relevant, vision, but not thinking.
Verify both CLI and API output do not contain Thinking..., <think>, or reasoning prose:
ollama run TARGET_MODEL "用一句话回答:1+1等于几?"
curl -s http://127.0.0.1:11434/api/chat -d '{"model":"TARGET_MODEL","messages":[{"role":"user","content":"用一句话回答:1+1等于几?"}],"stream":false}'
In Codex sandboxes, local Ollama calls or writes under ~/.ollama may require user approval. Request escalation plainly when needed.
Run the script from the skill directory or pass an absolute path:
python3 /path/to/ollama-qwen-nothink/scripts/create_qwen_nothink_ollama.py qwen3.6:35b-mlx
Default target naming appends -nothink to the source tag:
qwen3.6:35b-mlx -> qwen3.6:35b-mlx-nothink
qwen3:latest -> qwen3-nothink:latest
Useful options:
python3 scripts/create_qwen_nothink_ollama.py qwen3.6:35b-mlx --target qwen3.6:35b-mlx-nothink
python3 scripts/create_qwen_nothink_ollama.py qwen3.6:35b-mlx --dry-run
python3 scripts/create_qwen_nothink_ollama.py qwen3.6:35b-mlx --skip-verify
python3 scripts/create_qwen_nothink_ollama.py custom-model:latest --allow-non-qwen
The script:
<think></think> block at the assistant prefix.ollama create for the derived target.thinking from capabilities and clears renderer/parser so Ollama CLI does not re-enable thinking mode.--skip-verify is set.This workflow is optimized for direct text answers. Clearing the thinking-aware renderer/parser can also remove Ollama's automatic tools capability for that derived tag, and vision behavior should be verified separately with a real image prompt if the user needs multimodal use. If tool calling or advanced renderer behavior matters more than a default no-think tag, prefer keeping the source model and calling it with --think=false or the API equivalent for each request.
If the script cannot run, create a Modelfile like this:
FROM SOURCE_MODEL
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
<think>
</think>
"""
PARAMETER stop "<|im_end|>"
SYSTEM """
你是一个直接回答的助手。
默认关闭思考模式;不要输出思考过程、推理草稿、<think>、thinking 或 reasoning 内容。
直接给出最终答案。
"""
Then run:
ollama create TARGET_MODEL -f Modelfile
After creation, patch the target's local config blob as described in references/manifest-patch.md. This second step is important; without it, Ollama may still treat the target as a thinking model.
--think=false at runtime as the reliable fallback.PARAMETER think false is not accepted by many Ollama versions, even though ollama run --think=false works./no_think in the system prompt often fails because Qwen thinking models may treat it as ordinary text after Ollama has already selected thinking rendering.thinking from the target config is not enough by itself if renderer/parser still point to a thinking-aware Qwen renderer.