StepFun step-audio-r1.1

v0.1.1

Use StepFun Chat Completions with model step-audio-r1.1 for non-streaming speech turns that can send text with optional local audio input and save the return...

1· 137·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for praanmichael/stepfun-step-audio-r1-1.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "StepFun step-audio-r1.1" (praanmichael/stepfun-step-audio-r1-1) from ClawHub.
Skill page: https://clawhub.ai/praanmichael/stepfun-step-audio-r1-1
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: STEPFUN_API_KEY, STEP_API_KEY
Required binaries: python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install stepfun-step-audio-r1-1

ClawHub CLI

Package manager switcher

npx clawhub@latest install stepfun-step-audio-r1-1
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (StepFun non-streaming audio chat) align with the included script and docs. The skill only asks for python3 and StepFun API keys (STEPFUN_API_KEY, legacy STEP_API_KEY), which are appropriate for calling StepFun's chat/completions and voice-list endpoints.
Instruction Scope
SKILL.md and script limit behavior to building a non-streaming chat payload, optionally embedding local audio (base64), calling StepFun endpoints, and saving response artifacts (response.json, audio file, transcript). It may invoke ffmpeg/afconvert to normalize input audio when present — this is documented and expected.
Install Mechanism
Instruction-only skill with a helper script and no install spec; nothing is downloaded or installed automatically. This is low-risk and consistent with the skill's purpose.
Credentials
Only STEPFUN_API_KEY (primary) and a legacy alias STEP_API_KEY are required. An optional STEP_API_BASE_URL override exists for testing; otherwise the default target is https://api.stepfun.com. The requested env vars match the API usage and are proportionate.
Persistence & Privilege
always is false; the skill does not request persistent global privileges or modify other skills. It writes output artifacts to a local output directory (configurable), which is expected behavior for a helper script.
Assessment
This skill appears internally consistent and limited to calling StepFun's non-streaming audio chat API. Before installing, confirm the source (no homepage provided) and review the script yourself since it will: (1) read STEPFUN_API_KEY (or legacy STEP_API_KEY) from the environment, (2) optionally run ffmpeg/afconvert on local audio via subprocess, and (3) save request/response files to the current or specified output directory. Do not set STEP_API_BASE_URL to an untrusted URL (it can redirect API calls). If you plan to run it on a shared host, keep the STEPFUN API key private and inspect the script for any policy or logging changes you want to make (e.g., output directory).

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🔊 Clawdis
Binspython3
EnvSTEPFUN_API_KEY, STEP_API_KEY
Primary envSTEPFUN_API_KEY
audiovk97d7pt97ervzycq5nxnzyq5dx83hxs8latestvk97d7pt97ervzycq5nxnzyq5dx83hxs8stepfunvk97d7pt97ervzycq5nxnzyq5dx83hxs8
137downloads
1stars
2versions
Updated 1mo ago
v0.1.1
MIT-0

StepFun step-audio-r1.1

Call StepFun's POST /v1/chat/completions endpoint with stream: false and model: step-audio-r1.1.

Use this skill when the user explicitly wants StepFun audio generation, speech-style replies through the Chat API, a standard non-streaming chat completion object, or local audio input encoded as input_audio.

Do not use this skill for realtime duplex voice sessions. Use StepFun Realtime API instead when the user wants low-latency live conversation.

step-audio-r1.1 does not support tool call. If the user needs tool calling, prefer step-audio-2 instead of this skill.

Quick Start

Text in, audio out:

python3 {baseDir}/scripts/stepfun_audio_chat.py \
  --prompt "用中文介绍一下苏州的春天,语气自然一点。" \
  --voice wenrounansheng \
  --format wav

Check available voice ids before a run:

python3 {baseDir}/scripts/stepfun_audio_chat.py \
  --list-voices

Text + local audio in, audio out:

python3 {baseDir}/scripts/stepfun_audio_chat.py \
  --prompt "听完这段语音后,总结重点,并用更简洁的话复述。" \
  --input-audio /path/to/input.wav \
  --voice wenrounansheng \
  --format wav

Build and inspect the non-streaming request without sending it:

python3 {baseDir}/scripts/stepfun_audio_chat.py \
  --prompt "测试 step-audio-r1.1 非流式 payload" \
  --dry-run \
  --print-json

What The Script Produces

The helper writes a fresh output directory for each run unless --output-dir is provided. Typical files are:

  • request.json: saved only for --dry-run
  • response.json: full non-streaming response object
  • response.<format>: decoded audio from choices[0].message.audio.data
  • transcript.txt: choices[0].message.audio.transcript
  • content.txt: textual assistant content when present

Common Flags

python3 {baseDir}/scripts/stepfun_audio_chat.py --help

Important flags:

  • --prompt: user text to send with the request
  • --input-audio: local audio file that will be base64-encoded into input_audio; non-WAV files are converted to WAV first when ffmpeg or afconvert is available
  • --system: optional system instruction
  • --voice: output voice name
  • --list-voices: query StepFun for account-level custom/cloned voices and print a few official voice hints
  • --format: non-streaming output audio format; this skill uses wav
  • --no-audio-output: request text-only output while still using the Chat API
  • --temperature: optional sampling override
  • --max-tokens: optional generation cap
  • --print-json: echo request or response JSON to stdout
  • --dry-run: build payload and stop before the network call

Configuration

Set STEPFUN_API_KEY in the environment, or inject it through OpenClaw skill config:

{
  skills: {
    entries: {
      "stepfun-step-audio-r1-1": {
        env: {
          STEPFUN_API_KEY: "STEP_KEY_HERE",
        },
      },
    },
  },
}

The script still accepts STEP_API_KEY as a legacy alias for backward compatibility, but the official name is STEPFUN_API_KEY.

Optional environment variables:

  • STEP_API_BASE_URL: overrides the default https://api.stepfun.com

Input audio note:

  • StepFun expects input_audio.data in data:audio/wav;base64,... format
  • Official docs mention WAV and MP3 input support
  • This script normalizes local input to WAV for maximum compatibility
  • If you pass m4a, mp3, aiff, or similar, this script will try to convert to WAV via ffmpeg or macOS afconvert
  • The normalized input_audio payload must stay within StepFun's 10MB base64 limit

Voice selection note:

  • step-audio-r1.1 needs audio.voice whenever you request audio output
  • The script defaults to wenrounansheng, which was validated in real smoke tests for this skill
  • For production use, prefer passing --voice explicitly
  • Use --list-voices to inspect account-level custom/cloned voice ids
  • Read references/stepfun-voices.md for how step-audio-r1.1, step-audio-2, and step-tts-* differ in voice usage

Workflow

  1. Confirm the user wants StepFun step-audio-r1.1 through Chat API.
  2. Choose whether the turn is text-only or text plus local audio input.
  3. Run the helper script with stream: false.
  4. Return the saved transcript, the audio file path, and any important response fields to the user.

Reference

Read references/stepfun-chat-api.md when you need the exact request shape, supported audio fields, or the non-streaming response layout. Read references/stepfun-voices.md when you need voice-selection guidance.

Comments

Loading comments...