Pre Recorded Transcription

Other

Transcribe pre-recorded audio files or URLs with Gladia. Use when the user needs batch/async transcription, speaker diarization, subtitles (SRT/VTT), PII redaction, translation, NER, summarization, chapterization, audio-to-LLM, or any audio intelligence on pre-recorded content. Always prefer the official SDK; fall back to raw REST only when SDK cannot satisfy the requirement.

Install

openclaw skills install pre-recorded-transcription

Pre-Recorded Transcription

Gladia's pre-recorded API transcribes audio and video files asynchronously.

SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.

When to Use

  • Existing audio/video files or URLs (including social/video links)
  • Batch or asynchronous transcription workflows
  • Pre-recorded-only features: diarization, PII redaction, subtitles

When NOT to use: If the user needs real-time / live transcription of a stream, microphone, or ongoing audio feed, use the live-transcription skill instead. Live transcription uses WebSocket sessions, not the pre-recorded API.

References

Consult these resources as needed:

  • ./references/transcription-options.md -- Full options (JS + Python)
  • ./references/managing-jobs.md -- get, list, getFile, delete
  • ./references/delivery-and-response.md -- Response shape and events
  • ../audio-intelligence/SKILL.md -- Feature availability and config
  • ../sdk-integration/SKILL.md -- Setup, config, SDK vs raw API
  • ../sdk-integration/references/sdk-versions.md -- Current SDK versions
  • ../troubleshooting/SKILL.md -- Errors and diagnostics

API Endpoints (reference — prefer SDK methods instead)

EndpointMethodSDK equivalent
/v2/uploadPOSTtranscribe() auto-uploads local files
/v2/pre-recordedPOSTcreate() / transcribe()
/v2/pre-recordedGETlist()
/v2/pre-recorded/:idGETget() / poll() / transcribe()
/v2/pre-recorded/:idDELETEdelete()
/v2/pre-recorded/:id/fileGETgetFile()

Workflow

Recommended (SDK)

The SDK transcribe() method handles upload, job creation, and polling in one call. Use this by default.

const result = await client.preRecorded().transcribe("./audio.mp3", {
  language_config: { languages: ["en"] },
  diarization: true,
});

console.log(result.result?.transcription?.full_transcript);
result = client.prerecorded().transcribe(
    "audio.mp3",
    {"language_config": {"languages": ["en"]}, "diarization": True},
)

print(result.result.transcription.full_transcript)

Audio input can be a local file path, HTTP(S) URL, social/video URL, or binary file object. For full input types, see sdk-integration.

Fallback (raw REST — only when SDK is not feasible)

Use raw REST only when SDK use is not possible.

  1. Upload (if local file): POST /v2/upload with multipart form data → get audio_url
  2. Create job: POST /v2/pre-recorded with audio_url and config → get id
  3. Poll: GET /v2/pre-recorded/:id until status: "done" (or use webhooks/callbacks)
  4. Parse results: Extract transcription, diarization, translation, etc. from response

Managing Jobs

Use SDK methods for post-processing operations:

  • JavaScript: client.preRecorded().get(id), .list(filters), .getFile(id), .delete(id)
  • Python: client.prerecorded().get(id), .list(filters), .get_file(id), .delete(id)

For full JS/Python examples, pagination filters, and REST equivalents, see ./references/managing-jobs.md.

Transcription Options

All options are passed as the second argument to transcribe(). Key options:

OptionDescription
language_configExpected languages, code switching
diarizationSpeaker identification (pre-recorded only)
translationTranslate to target languages
summarizationGenerate bullet points or paragraph summary
subtitlesGenerate SRT/VTT files
pii_redactionRedact PII (pre-recorded only)
audio_to_llmRun custom LLM prompts on transcript
callback_urlAsync webhook delivery

For full option details, see ./references/transcription-options.md. For audio intelligence config, see audio-intelligence. For client-level retry/timeouts, see sdk-integration.

Response and Delivery

For full response JSON and event names, see ./references/delivery-and-response.md.

Limits and Specifications

ConstraintValue
Max file size1000 MB
Max duration135 minutes (120 min for YouTube)
Enterprise max duration4h15
Concurrency (paid)25 concurrent jobs
Concurrency (free)3 concurrent jobs

Polling Best Practices

The SDK handles polling automatically — transcribe() polls until the job completes with configurable interval and timeout:

const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000, // Poll every 5s
  timeout: 600000, // Timeout after 10 minutes
});

If using raw REST instead of the SDK:

  • Use webhooks or callbacks instead of polling when possible
  • If polling, implement exponential backoff (start at 3s, max 30s)

Common Mistakes

  • Code switching without language list: enabling code_switching: true with empty languages triggers 100+ language evaluation. Always provide 3-5 expected languages.
  • Polling without backoff: rapid polling wastes requests and may trigger 429s. The SDK handles this; for raw REST, use webhooks or exponential backoff.
  • Expecting live-only features: diarization, PII redaction, and subtitles are pre-recorded only — not available in live mode.
  • Wrong audio file path: the audio download endpoint is /v2/pre-recorded/:id/file, not /v2/pre-recorded/:id/audio.

For the full list of gotchas and diagnostics, see the troubleshooting skill.

Further Reading