Install
openclaw skills install live-transcriptionReal-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center integration, live subtitles, or any application streaming audio for low-latency partial and final transcripts. Always prefer the official SDK; fall back to raw WebSocket/REST only when SDK cannot satisfy the requirement.
openclaw skills install live-transcriptionGladia's live API transcribes audio in real-time over WebSocket.
SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.
When NOT to use: If the user has a pre-existing audio/video file or URL to transcribe after the fact, use the pre-recorded-transcription skill instead. Pre-recorded supports additional features like speaker diarization and PII redaction that are unavailable in live mode.
Consult these resources as needed:
startSession() config (JS + Python)get, list, getFile, delete| Endpoint | Method | SDK equivalent |
|---|---|---|
/v2/live | POST | startSession() |
/v2/live | GET | list() |
/v2/live/:id | GET | get(id) |
/v2/live/:id | DELETE | delete(id) |
/v2/live/:id/file | GET | getFile(id) |
| WebSocket URL from init | — | sendAudio() / session.on() |
SDK flow: startSession() -> sendAudio() -> receive transcript events -> stopRecording() -> get(id) for final result.
For SDK installation and client initialization, see the sdk-integration skill.
const session = client.liveV2().startSession({
model: "solaria-1",
encoding: "wav/pcm",
sample_rate: 16000,
bit_depth: 16,
channels: 1,
language_config: { languages: ["en"] },
messages_config: { receive_partial_transcripts: true },
});
session.on("message", (msg) => {
if (msg.type === "transcript") console.log(msg.data.utterance.text);
});
session.sendAudio(audioBuffer);
session.stopRecording();
from gladiaio_sdk import (
LiveV2InitRequest,
LiveV2LanguageConfig,
LiveV2MessagesConfig,
LiveV2WebSocketMessage,
)
live_client = client.live()
session = live_client.start_session(
LiveV2InitRequest(
model="solaria-1",
encoding="wav/pcm",
sample_rate=16000,
bit_depth=16,
channels=1,
language_config=LiveV2LanguageConfig(languages=["en"]),
messages_config=LiveV2MessagesConfig(receive_partial_transcripts=True),
)
)
@session.on("message")
def on_message(message: LiveV2WebSocketMessage):
if message.type == "transcript":
print(message.data.utterance.text.strip())
session.send_audio(audio_bytes)
session.stop_recording()
Core fields to set on every session:
encoding, sample_rate, bit_depth, channels (must exactly match the stream)language_config.languages and optional code_switchingmessages_config.receive_partial_transcripts and speech eventspre_processing, realtime_processing, post_processingSee ./references/session-config.md for full examples and sdk-integration for client retry/timeout settings.
endpointing is the primary latency-versus-completeness control for final transcripts.
| Use case | Recommended value |
|---|---|
| Voice agent | 0.05 - 0.1 |
| Call center | 0.1 - 0.3 |
| Live subtitles | 0.2 - 0.4 |
| Meeting recorder | 0.3 - 0.5 |
For maximum_duration_without_endpointing, speech_threshold, and full tuning guidance, see ./references/recommended-params.md.
Use session.sendAudio(chunk) (JS) / session.send_audio(chunk) (Python) to stream audio data. The SDK sends each chunk as a binary WebSocket frame.
encoding, sample_rate, bit_depth, and channels in session configsession.stopRecording(); // Triggers post-processing, then session ends
session.stop_recording() # Triggers post-processing, then session ends
session.endSession(); // Immediately closes, no post-processing
session.end_session() # Immediately closes, no post-processing
SDK reconnection is automatic (wsRetry). For raw WebSocket fallback, reconnect to the same URL.
| Constraint | Value |
|---|---|
| Max session duration | 3 hours |
| Supported encodings | wav/pcm, wav/alaw, wav/ulaw |
| Concurrency (paid) | 30 concurrent sessions |
| Concurrency (free) | 1 concurrent session |
| Billing | Per-second of streamed audio |
| Multi-channel | Billed as N x duration |
Use SDK methods for post-capture operations:
client.liveV2().get(id), .list(filters), .getFile(id), .delete(id)client.live().get(id), .list(filters), .get_file(id), .delete(id)For full examples and pagination filters, see ./references/managing-sessions.md.
encoding, sample_rate, bit_depth, and channels in session config MUST match the actual audio stream exactly.stopRecording() keeps it hanging./v2/live/:id/file, not /v2/live/:id/audio.For the full list of gotchas and diagnostics, see the troubleshooting skill.