Speech To Text With Speakers

Speech to Text With Speakers: Transcribe audio from file_id or public_url with three tiered actions for recordings up to 15, 30, or 60 minutes. Use when an agent needs speech to text with speakers, transcribe meeting recordings, generate subtitles and captions for videos, convert voice memos to searchable text, transcribe podcast episodes, transcribe extended, file id, public url through AgentPMT-hosted remote tool calls. Discovery terms: speech to text with speakers.

AgentPMT@agentpmt

Install

openclaw skills install @agentpmt/speech-to-text-with-speakers

Speech to Text With Speakers

Freshness

Last updated: 2026-06-24.

If the current date is more than 7 days after the last updated date, reinstall this skill from skills.sh or ClawHub before relying on endpoints, schemas, setup steps, or examples.

What This Tool Does

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

Product Instructions

Speech to Text

Transcribe audio with one tool and choose the action that matches the upload length.

Tool Call Format

json

{
  "action": "get_instructions"
}

json

{
  "action": "transcribe_quick",
  "file_id": "FILE_ID",
  "language_code": "en-US",
  "output_format": "text"
}

json

{
  "action": "transcribe_standard",
  "public_url": "https://example.com/meeting.m4a",
  "output_format": "vtt",
  "enable_word_timestamps": true,
  "enable_diarization": true
}

json

{
  "action": "transcribe_extended",
  "public_url": "https://example.com/interview.webm",
  "output_format": "json",
  "max_alternatives": 2
}

json

{
  "action": "transcribe_standard",
  "file_id": "FILE_ID",
  "output_format": "json",
  "enable_word_timestamps": true,
  "remove_filler_words": false
}

Actions

transcribe_quick: audio up to 15 minutes. Price: 100 credits.
transcribe_standard: audio up to 30 minutes. Price: 150 credits.
transcribe_extended: audio up to 60 minutes. Price: 200 credits.

Notes

Provide either file_id or public_url.
public_url must be an HTTPS URL and cannot point to private or internal network addresses.
If language_code is omitted, the tool defaults to en-US.
Supported output formats: text, srt, vtt, json.
Optional controls: enable_diarization, enable_word_timestamps, remove_filler_words, enable_profanity_filter, max_alternatives.
remove_filler_words defaults to true, which uses Google STT V2's cleaned transcript path.
Set remove_filler_words to false to preserve disfluencies through Vercel AI Gateway using the openai/whisper-1 gateway model slug. This path always requests word-level timestamps from the gateway for clipping workflows.
remove_filler_words=false does not support enable_diarization=true or max_alternatives greater than 1; use the default cleaned path for those features.
Subtitle responses include inline subtitle content and may also include stored file links during normal platform invocations.

When To Use

Use this skill for Speech to Text With Speakers on AgentPMT.
Use it when an agent needs this specific tool's behavior, schema, inputs, outputs, and invocation shape.
Search and activation keywords: speech to text with speakers, transcribe meeting recordings, generate subtitles and captions for videos, convert voice memos to searchable text, transcribe podcast episodes, transcribe extended, file id, public url.
Supported action names: transcribe_extended, transcribe_quick, transcribe_standard.

Use Cases

Transcribe meeting recordings
Generate subtitles and captions for videos
Convert voice memos to searchable text
Transcribe podcast episodes
Create interview transcripts with speaker labels
Produce SRT or WebVTT subtitle files
Build searchable audio archives
Transcribe webinars and lectures
Analyze customer call recordings
Content repurposing from audio to text

Categories And Industries

No categories or industry tags are published for this tool.

Actions And Schema

Complete generated action schema: ./schema.md. Supported action count: 3. x402 availability: not enabled for this product.

transcribe_extended (action slug: transcribe-extended): Transcribe audio up to 60 minutes. Price: 200 credits. Parameters: enable_diarization, enable_profanity_filter, enable_word_timestamps, file_id, language_code, max_alternatives, output_format, public_url, plus 1 more.
transcribe_quick (action slug: transcribe-quick): Transcribe audio up to 15 minutes. Price: 100 credits. Parameters: enable_diarization, enable_profanity_filter, enable_word_timestamps, file_id, language_code, max_alternatives, output_format, public_url, plus 1 more.
transcribe_standard (action slug: transcribe-standard): Transcribe audio up to 30 minutes. Price: 150 credits. Parameters: enable_diarization, enable_profanity_filter, enable_word_timestamps, file_id, language_code, max_alternatives, output_format, public_url, plus 1 more.

Live Schema And Examples

Use the compact schema above for ordinary calls. Before a new production integration, or whenever parameters, enum values, nested objects, outputs, or examples are unclear, fetch live details first.

Exact schema: call agentpmt-tool-search-and-execution with action: "get_schema", and tool_id: "speech-to-text-with-speakers".
Detailed examples: call agentpmt-tool-search-and-execution with action: "get_instructions" and tool_id: "speech-to-text-with-speakers", or call this product with action: "get_instructions" when the product tool is already selected.
Treat returned live schema and instructions as more specific than this generated summary.

MCP schema lookup through the main AgentPMT MCP server:

json

{
  "method": "tools/call",
  "params": {
    "name": "AgentPMT-Tool-Search-and-Execution",
    "arguments": {
      "action": "get_schema",
      "tool_id": "speech-to-text-with-speakers"
    }
  }
}

For live examples, keep the same MCP tool and use these arguments:

json

{
  "action": "get_instructions",
  "tool_id": "speech-to-text-with-speakers"
}

Authenticated AgentPMT REST schema lookup body:

json

{
  "name": "agentpmt-tool-search-and-execution",
  "parameters": {
    "action": "get_schema",
    "tool_id": "speech-to-text-with-speakers"
  }
}

Authenticated AgentPMT REST live examples body:

json

{
  "name": "agentpmt-tool-search-and-execution",
  "parameters": {
    "action": "get_instructions",
    "tool_id": "speech-to-text-with-speakers"
  }
}

Call This Tool

Product slug: speech-to-text-with-speakers

Marketplace page: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers

AgentPMT account route: first use ../agentpmt-account-mcp-rest-api-setup to connect the main MCP server or REST API for an Agent Group where this tool is enabled.
x402 route: not enabled for this product.
AgentPMT overview: use ../what-is-agentpmt for marketplace, Agent Group, workflow, MCP, REST, and payment concepts.

If those setup skills are not installed beside this product skill, use the downloads below.

Core AgentPMT setup skills:

What AgentPMT is: ../what-is-agentpmt
- ClawHub page: https://clawhub.ai/agentpmt/what-is-agentpmt
- OpenClaw install: openclaw skills install what-is-agentpmt
- skills.sh install: npx skills add AgentPMT/agent-skills --skill what-is-agentpmt
AgentPMT account MCP/REST setup: ../agentpmt-account-mcp-rest-api-setup
- ClawHub page: https://clawhub.ai/agentpmt/agentpmt-account-mcp-rest-api-setup
- OpenClaw install: openclaw skills install agentpmt-account-mcp-rest-api-setup
- skills.sh install: npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup

skills.sh install script:

bash

npx skills add AgentPMT/agent-skills --skill what-is-agentpmt
npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup

MCP call shape after the main AgentPMT MCP server is connected:

json

{
  "method": "tools/call",
  "params": {
    "name": "Speech-to-Text-With-Speakers",
    "arguments": {
      "action": "transcribe_extended",
      "enable_diarization": false,
      "enable_profanity_filter": false,
      "enable_word_timestamps": false,
      "file_id": "example file id",
      "language_code": "example language code",
      "max_alternatives": 1,
      "output_format": "text",
      "public_url": "https://example.com"
    }
  }
}

Use the exact tool name returned by tools/list; the name above is the expected readable form.

Authenticated AgentPMT REST call body:

json

{
  "name": "speech-to-text-with-speakers",
  "parameters": {
    "action": "transcribe_extended",
    "enable_diarization": false,
    "enable_profanity_filter": false,
    "enable_word_timestamps": false,
    "file_id": "example file id",
    "language_code": "example language code",
    "max_alternatives": 1,
    "output_format": "text",
    "public_url": "https://example.com"
  }
}

Use the setup skill for the account connection details before making REST calls.

Response Handling

Treat the returned JSON as the source of truth for this tool call.
If the response includes warnings or correction targets, apply them before retrying.
If the response includes a passed or success-style boolean, use it as the workflow gate.
If validation fails or the response shape is unclear, call get_schema or get_instructions before retrying.
If transcribe_extended fails, preserve the request parameters and retry only after fixing schema, auth, or payment errors.

Security

Do not place account secrets, wallet private keys, mnemonics, signatures, or payment headers in prompts or logs.
Keep tool inputs scoped to the minimum content needed for the task.
Use the setup skills for credential handling; this product skill only defines product-specific behavior.

AgentPMT Reference

What AgentPMT is: ../what-is-agentpmt (ClawHub: what-is-agentpmt, page: https://clawhub.ai/agentpmt/what-is-agentpmt; skills.sh: npx skills add AgentPMT/agent-skills --skill what-is-agentpmt)
AgentPMT account MCP/REST setup: ../agentpmt-account-mcp-rest-api-setup (ClawHub: agentpmt-account-mcp-rest-api-setup, page: https://clawhub.ai/agentpmt/agentpmt-account-mcp-rest-api-setup; skills.sh: npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup)
Marketplace product: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers
AgentPMT main MCP server: https://api.agentpmt.com/mcp/
AgentPMT REST invoke endpoint: https://api.agentpmt.com/products/purchase

Speech To Text With Speakers

Install

Speech to Text With Speakers

Freshness

What This Tool Does

Product Instructions

Speech to Text

Tool Call Format

Actions

Notes

When To Use

Use Cases

Related Product Skills

Categories And Industries

Actions And Schema

Live Schema And Examples

Call This Tool

Response Handling

Security

AgentPMT Reference