Install
openclaw skills install meeting-to-textCreate a fully local speaker-separated .txt transcript from a meeting recording, meeting screen recording, speech audio, or local video/audio file. Use this whenever the user wants to transcribe a local recording into plain text, generate a meeting transcript, convert audio or video to txt, or explicitly asks to distinguish speakers with default labels like 说话人1, 说话人2, etc. Trigger even if the user only provides an input file path and an output path and says things like "转文字", "做逐字稿", "会议录音转 txt", or "区分发言人".
openclaw skills install meeting-to-textUse this skill when the job is a local file-to-transcript workflow.
Do not use this skill if the user only wants audio extraction, a meeting summary, environment setup, or an explanation of the models.
Always collect:
Output target rules:
.txt, write exactly to that file.<source-stem>_transcript.txt inside it.Supported source types:
.mp4, .mkv, .mov, .avi, .webm.wav, .mp3, .m4a, .aac, .flac, .oggRead references/runtime_paths.md before running the script.
Run the bundled entrypoint with the local ASR environment:
& '<YOUR_CONDA_ENV_PYTHON_PATH>' 'C:\path\to\your\meeting-to-text\scripts\meeting_to_text.py' --input '<SOURCE_PATH>' --output '<OUTPUT_TARGET>'
If you need a stable temp location, add:
--work-dir '<YOUR_WORKSPACE_TEMP_PATH>'
The script may print library noise before the final machine-readable result.
Always treat the last non-empty stdout line as the JSON result object.
Interpret results this way:
0 with status: success: transcript file was created with no warnings.0 with status: warning: transcript file was created, but you must report the warnings and any skipped segments.status: error: do not claim success; surface the warning list and the intended output path.Important fields in the final JSON:
output_path: final transcript file pathspeaker_count: number of detected 说话人N labels in the written transcriptsegment_count: normalized diarization segments sent into transcriptiontranscribed_segment_count: segments that produced textskipped_segment_count: dropped or failed segmentsfailed_segments: segment-level failures with start, end, and reasonwarnings: run-level warnings such as only one speaker detectedThe entrypoint already enforces the workflow. Do not rewrite the pipeline ad hoc in the conversation.
The script will:
说话人NOn success, report:
On failure, report:
Read these only when needed: