meeting-to-text

Create a fully local speaker-separated .txt transcript from a meeting recording, meeting screen recording, speech audio, or local video/audio file. Use this...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 31 · 0 current installs · 0 all-time installs

by@henrCh1

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

Name/description: local speaker-separated transcript. Implementation: expects local ASR/VAD model directories and a local 3D-Speaker repo, which is coherent. However the script also calls modelscope.hub.snapshot_download to fetch a speaker model at runtime if not cached — this contradicts the 'fully local' claim. The skill declares no required env vars or binaries, yet it requires a local ffmpeg executable and several model/repo paths to be present (or network access to download a model).

Instruction Scope

SKILL.md instructs running the bundled Python entrypoint and to read runtime_paths.md, and treats the last stdout line as JSON — that matches the script. It does not disclose that the script may perform a network download (modelscope.snapshot_download) if a speaker model cache is missing, nor does it highlight the strong dependency on a local 3D-Speaker repo layout and local models; this is scope-opaque and could surprise users who expect strictly offline operation.

✓

Install Mechanism

No install spec (instruction-only + bundled Python script), so nothing is written by an installer. However, the runtime script will import libraries and call snapshot_download at runtime (network download) and invoke ffmpeg via subprocess; there is no package install step described.

ℹ

Credentials

Requires no credentials or special env vars in metadata, which is appropriate. The code does rely on several local path defaults (PROJECT_ROOT-based) and allows overrides via MEETING_TO_TEXT_* env vars — these are reasonable but not declared. No secrets are requested, but the skill reads and writes local files (models, repos, temp dirs) and may download model artifacts from ModelScope.

✓

Persistence & Privilege

always is false and the skill does not request persistent platform privileges. It executes as a one-off script and does not modify other skills or global agent config.

What to consider before installing

This skill will execute the bundled Python script on your machine and expects local models, a 3D-Speaker repo, and ffmpeg; if those are absent it will try to download a speaker model from ModelScope at runtime (so it is not strictly offline). Before installing/running: (1) review the script and decide whether you trust running arbitrary Python code and subprocesses on your system; (2) prepare the local directories listed in references/runtime_paths.md (or set the MEETING_TO_TEXT_* env vars) to avoid runtime downloads; (3) ensure ffmpeg is available at the expected path or set MEETING_TO_TEXT_FFMPEG; (4) run the skill inside an isolated/temporary environment if you want to limit risk. If you require guaranteed offline behavior, do not use this skill unless you pre-populate the expected model cache and repo paths.

✗

scripts/meeting_to_text.py:225

Dynamic code execution detected.

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97f6y7p3b4pvr6c3v40dde901830brg

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Meeting To Text

Use this skill when the job is a local file-to-transcript workflow.

Do not use this skill if the user only wants audio extraction, a meeting summary, environment setup, or an explanation of the models.

Inputs To Collect

Always collect:

one local source file path
one output target path

Output target rules:

If the target ends with .txt, write exactly to that file.
Otherwise treat it as a directory and write <source-stem>_transcript.txt inside it.

Supported source types:

Video: .mp4, .mkv, .mov, .avi, .webm
Audio: .wav, .mp3, .m4a, .aac, .flac, .ogg

Runtime

Read references/runtime_paths.md before running the script.

Run the bundled entrypoint with the local ASR environment:

& '<YOUR_CONDA_ENV_PYTHON_PATH>' 'C:\path\to\your\meeting-to-text\scripts\meeting_to_text.py' --input '<SOURCE_PATH>' --output '<OUTPUT_TARGET>'

If you need a stable temp location, add:

--work-dir '<YOUR_WORKSPACE_TEMP_PATH>'

Result Handling

The script may print library noise before the final machine-readable result.

Always treat the last non-empty stdout line as the JSON result object.

Interpret results this way:

Exit code 0 with status: success: transcript file was created with no warnings.
Exit code 0 with status: warning: transcript file was created, but you must report the warnings and any skipped segments.
Non-zero exit code or status: error: do not claim success; surface the warning list and the intended output path.

Important fields in the final JSON:

output_path: final transcript file path
speaker_count: number of detected 说话人N labels in the written transcript
segment_count: normalized diarization segments sent into transcription
transcribed_segment_count: segments that produced text
skipped_segment_count: dropped or failed segments
failed_segments: segment-level failures with start, end, and reason
warnings: run-level warnings such as only one speaker detected

Behavior Guarantees

The entrypoint already enforces the workflow. Do not rewrite the pipeline ad hoc in the conversation.

The script will:

normalize audio with FFmpeg instead of renaming extensions
use local SenseVoiceSmall for ASR
use local 3D-Speaker embeddings plus clustering for diarization
write a plain text transcript with timestamps and 说话人N
stop on diarization failure instead of silently emitting a non-speaker-separated transcript

Report Back To The User

On success, report:

the final transcript path
whether the source was audio or video
the detected speaker count
any warnings that matter for review

On failure, report:

the exit code category
the warning message from the JSON result
whether the failure happened during validation, media normalization, diarization, transcription, or output writing

References

Read these only when needed:

references/runtime_paths.md: fixed local paths and command template
references/troubleshooting.md: common runtime issues and how to interpret them

Files

5 total

Select a file

Select a file to preview.

Comments

Loading comments…