Install
openclaw skills install @agentpmt/speech-to-text-with-speakersSpeech to Text With Speakers: Transcribe audio from file_id or public_url with three tiered actions for recordings up to 15, 30, or 60 minutes. Use when an agent needs speech to text with speakers, transcribe meeting recordings, generate subtitles and captions for videos, convert voice memos to searchable text, transcribe podcast episodes, transcribe extended, file id, public url through AgentPMT-hosted remote tool calls. Discovery terms: speech to text with speakers.
openclaw skills install @agentpmt/speech-to-text-with-speakersLast updated: 2026-06-24.
If the current date is more than 7 days after the last updated date, reinstall this skill from skills.sh or ClawHub before relying on endpoints, schemas, setup steps, or examples.
Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.
Transcribe audio with one tool and choose the action that matches the upload length.
{
"action": "get_instructions"
}
{
"action": "transcribe_quick",
"file_id": "FILE_ID",
"language_code": "en-US",
"output_format": "text"
}
{
"action": "transcribe_standard",
"public_url": "https://example.com/meeting.m4a",
"output_format": "vtt",
"enable_word_timestamps": true,
"enable_diarization": true
}
{
"action": "transcribe_extended",
"public_url": "https://example.com/interview.webm",
"output_format": "json",
"max_alternatives": 2
}
{
"action": "transcribe_standard",
"file_id": "FILE_ID",
"output_format": "json",
"enable_word_timestamps": true,
"remove_filler_words": false
}
transcribe_quick: audio up to 15 minutes. Price: 100 credits.transcribe_standard: audio up to 30 minutes. Price: 150 credits.transcribe_extended: audio up to 60 minutes. Price: 200 credits.file_id or public_url.public_url must be an HTTPS URL and cannot point to private or internal network addresses.language_code is omitted, the tool defaults to en-US.text, srt, vtt, json.enable_diarization, enable_word_timestamps, remove_filler_words, enable_profanity_filter, max_alternatives.remove_filler_words defaults to true, which uses Google STT V2's cleaned transcript path.remove_filler_words to false to preserve disfluencies through Vercel AI Gateway using the openai/whisper-1 gateway model slug. This path always requests word-level timestamps from the gateway for clipping workflows.remove_filler_words=false does not support enable_diarization=true or max_alternatives greater than 1; use the default cleaned path for those features.Speech to Text With Speakers on AgentPMT.transcribe_extended, transcribe_quick, transcribe_standard.file-management, page: https://clawhub.ai/agentpmt/file-management; skills.sh: npx skills add AgentPMT/agent-skills --skill file-management)No categories or industry tags are published for this tool.
Complete generated action schema: ./schema.md.
Supported action count: 3.
x402 availability: not enabled for this product.
transcribe_extended (action slug: transcribe-extended): Transcribe audio up to 60 minutes. Price: 200 credits. Parameters: enable_diarization, enable_profanity_filter, enable_word_timestamps, file_id, language_code, max_alternatives, output_format, public_url, plus 1 more.transcribe_quick (action slug: transcribe-quick): Transcribe audio up to 15 minutes. Price: 100 credits. Parameters: enable_diarization, enable_profanity_filter, enable_word_timestamps, file_id, language_code, max_alternatives, output_format, public_url, plus 1 more.transcribe_standard (action slug: transcribe-standard): Transcribe audio up to 30 minutes. Price: 150 credits. Parameters: enable_diarization, enable_profanity_filter, enable_word_timestamps, file_id, language_code, max_alternatives, output_format, public_url, plus 1 more.Use the compact schema above for ordinary calls. Before a new production integration, or whenever parameters, enum values, nested objects, outputs, or examples are unclear, fetch live details first.
agentpmt-tool-search-and-execution with action: "get_schema", and tool_id: "speech-to-text-with-speakers".agentpmt-tool-search-and-execution with action: "get_instructions" and tool_id: "speech-to-text-with-speakers", or call this product with action: "get_instructions" when the product tool is already selected.MCP schema lookup through the main AgentPMT MCP server:
{
"method": "tools/call",
"params": {
"name": "AgentPMT-Tool-Search-and-Execution",
"arguments": {
"action": "get_schema",
"tool_id": "speech-to-text-with-speakers"
}
}
}
For live examples, keep the same MCP tool and use these arguments:
{
"action": "get_instructions",
"tool_id": "speech-to-text-with-speakers"
}
Authenticated AgentPMT REST schema lookup body:
{
"name": "agentpmt-tool-search-and-execution",
"parameters": {
"action": "get_schema",
"tool_id": "speech-to-text-with-speakers"
}
}
Authenticated AgentPMT REST live examples body:
{
"name": "agentpmt-tool-search-and-execution",
"parameters": {
"action": "get_instructions",
"tool_id": "speech-to-text-with-speakers"
}
}
Product slug: speech-to-text-with-speakers
Marketplace page: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers
../agentpmt-account-mcp-rest-api-setup to connect the main MCP server or REST API for an Agent Group where this tool is enabled.../what-is-agentpmt for marketplace, Agent Group, workflow, MCP, REST, and payment concepts.If those setup skills are not installed beside this product skill, use the downloads below.
Core AgentPMT setup skills:
openclaw skills install what-is-agentpmtnpx skills add AgentPMT/agent-skills --skill what-is-agentpmtopenclaw skills install agentpmt-account-mcp-rest-api-setupnpx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setupskills.sh install script:
npx skills add AgentPMT/agent-skills --skill what-is-agentpmt
npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup
MCP call shape after the main AgentPMT MCP server is connected:
{
"method": "tools/call",
"params": {
"name": "Speech-to-Text-With-Speakers",
"arguments": {
"action": "transcribe_extended",
"enable_diarization": false,
"enable_profanity_filter": false,
"enable_word_timestamps": false,
"file_id": "example file id",
"language_code": "example language code",
"max_alternatives": 1,
"output_format": "text",
"public_url": "https://example.com"
}
}
}
Use the exact tool name returned by tools/list; the name above is the expected readable form.
Authenticated AgentPMT REST call body:
{
"name": "speech-to-text-with-speakers",
"parameters": {
"action": "transcribe_extended",
"enable_diarization": false,
"enable_profanity_filter": false,
"enable_word_timestamps": false,
"file_id": "example file id",
"language_code": "example language code",
"max_alternatives": 1,
"output_format": "text",
"public_url": "https://example.com"
}
}
Use the setup skill for the account connection details before making REST calls.
passed or success-style boolean, use it as the workflow gate.get_schema or get_instructions before retrying.transcribe_extended fails, preserve the request parameters and retry only after fixing schema, auth, or payment errors.what-is-agentpmt, page: https://clawhub.ai/agentpmt/what-is-agentpmt; skills.sh: npx skills add AgentPMT/agent-skills --skill what-is-agentpmt)agentpmt-account-mcp-rest-api-setup, page: https://clawhub.ai/agentpmt/agentpmt-account-mcp-rest-api-setup; skills.sh: npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup)