Install
openclaw skills install flowyaipc-herdsman-skill-1-0-2-enIntegration package for the Herdsman model engine. Used by other agent platforms to call scripts in this directory and protocol specifications when connecting to OpenAI, Anthropic, or AGUI-compatible services.
openclaw skills install flowyaipc-herdsman-skill-1-0-2-enThis directory is not a single script but an integration package for reuse by other agent platforms, enabling external agents to reliably access the Herdsman local model engine.
/aguihttp://127.0.0.1:8080http://127.0.0.1:8080/v1http://127.0.0.1:8080/v1/anthropic/messageshttp://127.0.0.1:8080/aguiAuthorization: Bearer <key>curl commands directlyDo not construct complex prompts, tools, base64 images, or long-timeout tasks directly in the shell. Prefer using Python scripts under scripts/, or generate temporary Python files following the same pattern.
Before calling any model, always run:
python headsman-skill/scripts/check_model.py
If you know the model name, you can also:
python headsman-skill/scripts/check_model.py "<model_id>"
timeout >= 120timeout >= 120timeout >= 120timeout >= 60If results will be reused in subsequent conversations, save them to outputs/ and return the absolute path or cache URL to the user.
Preferred for:
Core endpoints:
GET /v1/modelsPOST /v1/chat/completionsPOST /v1/embeddingsPOST /v1/rerankPOST /v1/images/generationsPOST /v1/images/editsPOST /v1/images/img2imgGET /v1/images/cache/:filenamePOST /v1/ocrPOST /v1/audio/transcriptionsGET /v1/audio/transcriptions/stream?model= (WebSocket)POST /v1/audio/speechGET /v1/audio/speech/stream/:tokenGET /v1/audio/info?model=Additional parameters for chat completions (OpenAI Chat Completions compatible extensions):
| Parameter | Type | Description |
|---|---|---|
reasoning_effort | string | Reasoning level: low / medium / high; local llama.cpp maps to template parameters |
thinking_enabled | boolean | Enable or disable thinking mode for supported models; local llama.cpp maps to enable_thinking |
thinking_tokens | number | Thinking token budget; local llama.cpp maps to reasoning_budget |
For platforms that only support the Anthropic Messages style, the endpoint is:
POST /v1/anthropic/messagesTherefore:
/v1/messages, add a lightweight proxy on the platform side or use raw HTTP requestsFor platforms supporting the AG-UI protocol event stream:
POST /aguiAGUI is more suitable for protocol clients or SDKs; raw HTTP is not recommended. In the current state, state should at least provide model, and may optionally include webSearch, tools, task_type, pass_through.
scripts/herdsman_client.py: General HTTP client wrapperscripts/check_model.py: Model discovery and filteringscripts/chat_completion.py: OpenAI chat completion (supports reasoning_effort / thinking)scripts/generate_image.py: Text-to-image generation with auto-downloadscripts/edit_image.py: Image editing with support for local files, URLs, masks, and additional reference imagesscripts/img2img.py: Image-to-image (style transfer, inpainting)scripts/ocr.py: OCR text recognition, supports direct local image recognitionscripts/transcribe_audio.py: Speech transcription, supports local files, URLs, and data URLsscripts/audio_speech.py: Text-to-speech (TTS), supports VoiceDesign, VoiceClone, and streamingscripts/anthropic_messages.py: Anthropic Messages compatible invocationreferences/api-examples.md: Capability-based call examplesreferences/platform-integration.md: OpenAI / Anthropic / AGUI integration guidereferences/error-codes.md: Common errors and agent-side handling strategiesreferences/model-capabilities.md: Model capabilities and endpoint mappingoutputs/: Recommended directory for saving generated imagescheck_model.py first to get installed modelsmodel_not_found, model_not_installed, invalid_model_capability, re-run model discoveryaudio field) and multipart/form-data (file field)check_model.py to confirm paddleocr-ppocrv5-server or another OCR model is installedThe following three scripts are advanced speech tools integrated with Herdsman, supporting a full workflow from audio conversion to ASR transcription to voice cloning.
| Script | Function | External Dependency |
|---|---|---|
scripts/convert_audio.py | Audio format conversion (any format to 16kHz WAV) | ffmpeg |
scripts/transcribe_standalone.py | ASR speech transcription (pure urllib, no herdsman_client dependency) | Herdsman ASR model |
scripts/tts_voice_clone.py | Voice cloning TTS synthesis | Herdsman qwen3-tts-voiceclone |
Convert audio in any format (MP3/M4A/OGG, etc.) to 16kHz mono WAV. No Herdsman dependency.
uv run python scripts/convert_audio.py <input_path> [output_path]
Parameters:
input_path — Path to the reference audio fileoutput_path — Optional, defaults to same directory as input with .wav extensionExamples:
uv run python scripts/convert_audio.py ref.mp3
uv run python scripts/convert_audio.py ref.mp3 ref.wav
Standalone ASR transcription script (pure urllib, no dependency on herdsman_client.py). Dynamic model selection, supports absolute output paths.
uv run python scripts/transcribe_standalone.py <audio_path> --model <model_id> [--language <language>] [--output <absolute_path>]
Parameters:
audio_path — Input audio file path (.wav/.mp3/.m4a, etc.)--model — ASR model ID (required, dynamic selection)--language — Language code (optional, auto-detect by default)--output / -o — Output file absolute path, writes both .txt + .json (optional, prints only if not specified)--timeout — Timeout in seconds (default 300)Tested model recommendations:
| Model | Recommendation | Notes |
|---|---|---|
sherpa-onnx-paraformer-zh-small | ⭐ Preferred | Simplified Chinese, preserves filler words, ~5s fastest |
whisper-base | Alternative | General high accuracy, Traditional Chinese output |
funasr | ⚠️ | WebSocket streaming only, HTTP not supported |
sherpa-onnx-streaming-zipformer-zh-14m | ⚠️ | Streaming only, HTTP does not support full transcription |
Examples:
# Recommended
uv run python scripts/transcribe_standalone.py audio.wav --model sherpa-onnx-paraformer-zh-small --output "D:/result.txt"
# Print only
uv run python scripts/transcribe_standalone.py audio.wav --model whisper-base
Voice cloning TTS synthesis using qwen3-tts-voiceclone. Three dynamic parameters: reference audio WAV, original text, target script.
uv run python scripts/tts_voice_clone.py <ref_audio_wav> <ref_text> <target_text> [--output <path>]
Parameters:
ref_audio_wav — 16kHz mono WAV pathref_text — Original text corresponding to the reference audiotarget_text — Target text to be synthesized with cloned voice--output / -o — Output audio path (default ripple_tts_cloned.wav)--timeout — Timeout in seconds (default 180)Examples:
uv run python scripts/tts_voice_clone.py ref.wav "original text" "target synthesis text" -o output.wav
# 1. Convert to WAV
uv run python scripts/convert_audio.py source.mp3 ref.wav
# 2. ASR transcription (extract audio text for comparison)
uv run python scripts/transcribe_standalone.py ref.wav --model sherpa-onnx-paraformer-zh-small --output "D:/transcribed.txt"
# 3. Voice clone synthesis
uv run python scripts/tts_voice_clone.py ref.wav "original text" "target synthesis text" -o final.wav
--output for cross-directory use