Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

meeting-to-text

Create a fully local speaker-separated .txt transcript from a meeting recording, meeting screen recording, speech audio, or local video/audio file. Use this...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 31 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
Name/description: local speaker-separated transcript. Implementation: expects local ASR/VAD model directories and a local 3D-Speaker repo, which is coherent. However the script also calls modelscope.hub.snapshot_download to fetch a speaker model at runtime if not cached — this contradicts the 'fully local' claim. The skill declares no required env vars or binaries, yet it requires a local ffmpeg executable and several model/repo paths to be present (or network access to download a model).
!
Instruction Scope
SKILL.md instructs running the bundled Python entrypoint and to read runtime_paths.md, and treats the last stdout line as JSON — that matches the script. It does not disclose that the script may perform a network download (modelscope.snapshot_download) if a speaker model cache is missing, nor does it highlight the strong dependency on a local 3D-Speaker repo layout and local models; this is scope-opaque and could surprise users who expect strictly offline operation.
Install Mechanism
No install spec (instruction-only + bundled Python script), so nothing is written by an installer. However, the runtime script will import libraries and call snapshot_download at runtime (network download) and invoke ffmpeg via subprocess; there is no package install step described.
Credentials
Requires no credentials or special env vars in metadata, which is appropriate. The code does rely on several local path defaults (PROJECT_ROOT-based) and allows overrides via MEETING_TO_TEXT_* env vars — these are reasonable but not declared. No secrets are requested, but the skill reads and writes local files (models, repos, temp dirs) and may download model artifacts from ModelScope.
Persistence & Privilege
always is false and the skill does not request persistent platform privileges. It executes as a one-off script and does not modify other skills or global agent config.
What to consider before installing
This skill will execute the bundled Python script on your machine and expects local models, a 3D-Speaker repo, and ffmpeg; if those are absent it will try to download a speaker model from ModelScope at runtime (so it is not strictly offline). Before installing/running: (1) review the script and decide whether you trust running arbitrary Python code and subprocesses on your system; (2) prepare the local directories listed in references/runtime_paths.md (or set the MEETING_TO_TEXT_* env vars) to avoid runtime downloads; (3) ensure ffmpeg is available at the expected path or set MEETING_TO_TEXT_FFMPEG; (4) run the skill inside an isolated/temporary environment if you want to limit risk. If you require guaranteed offline behavior, do not use this skill unless you pre-populate the expected model cache and repo paths.
scripts/meeting_to_text.py:225
Dynamic code execution detected.
Patterns worth reviewing
These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97f6y7p3b4pvr6c3v40dde901830brg

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Meeting To Text

Use this skill when the job is a local file-to-transcript workflow.

Do not use this skill if the user only wants audio extraction, a meeting summary, environment setup, or an explanation of the models.

Inputs To Collect

Always collect:

  • one local source file path
  • one output target path

Output target rules:

  • If the target ends with .txt, write exactly to that file.
  • Otherwise treat it as a directory and write <source-stem>_transcript.txt inside it.

Supported source types:

  • Video: .mp4, .mkv, .mov, .avi, .webm
  • Audio: .wav, .mp3, .m4a, .aac, .flac, .ogg

Runtime

Read references/runtime_paths.md before running the script.

Run the bundled entrypoint with the local ASR environment:

& '<YOUR_CONDA_ENV_PYTHON_PATH>' 'C:\path\to\your\meeting-to-text\scripts\meeting_to_text.py' --input '<SOURCE_PATH>' --output '<OUTPUT_TARGET>'

If you need a stable temp location, add:

--work-dir '<YOUR_WORKSPACE_TEMP_PATH>'

Result Handling

The script may print library noise before the final machine-readable result.

Always treat the last non-empty stdout line as the JSON result object.

Interpret results this way:

  • Exit code 0 with status: success: transcript file was created with no warnings.
  • Exit code 0 with status: warning: transcript file was created, but you must report the warnings and any skipped segments.
  • Non-zero exit code or status: error: do not claim success; surface the warning list and the intended output path.

Important fields in the final JSON:

  • output_path: final transcript file path
  • speaker_count: number of detected 说话人N labels in the written transcript
  • segment_count: normalized diarization segments sent into transcription
  • transcribed_segment_count: segments that produced text
  • skipped_segment_count: dropped or failed segments
  • failed_segments: segment-level failures with start, end, and reason
  • warnings: run-level warnings such as only one speaker detected

Behavior Guarantees

The entrypoint already enforces the workflow. Do not rewrite the pipeline ad hoc in the conversation.

The script will:

  • normalize audio with FFmpeg instead of renaming extensions
  • use local SenseVoiceSmall for ASR
  • use local 3D-Speaker embeddings plus clustering for diarization
  • write a plain text transcript with timestamps and 说话人N
  • stop on diarization failure instead of silently emitting a non-speaker-separated transcript

Report Back To The User

On success, report:

  • the final transcript path
  • whether the source was audio or video
  • the detected speaker count
  • any warnings that matter for review

On failure, report:

  • the exit code category
  • the warning message from the JSON result
  • whether the failure happened during validation, media normalization, diarization, transcription, or output writing

References

Read these only when needed:

Files

5 total
Select a file
Select a file to preview.

Comments

Loading comments…