Audio Recognition

v1.0.0

音频语音识别服务(Speech-to-Text)。当用户上传音频文件,需要将语音内容转换为文字,或需要识别音频中的特定信息(如关键词、歌曲名)时触发。 适用于:(1) 会议录音转写 (2) 音频内容提取 (3) 语音指令识别 (4) 音视频字幕生成

0· 35·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (speech-to-text, diarization, punctuation, multi-language) aligns with the SKILL.md content. The skill does not ask for unrelated credentials, binaries, or config paths.
Instruction Scope
The SKILL.md outlines preprocessing, feature extraction, ASR models (Whisper/WeNet/Paraformer), and postprocessing at a high level. It does not tell the agent to read unrelated files or exfiltrate data, but it is high-level and leaves implementation choices unspecified (e.g., which model/service to call), so runtime behavior depends on the agent environment and any integrations the agent has.
Install Mechanism
No install specification or code files are present — the skill is instruction-only, so nothing will be written or executed by default.
Credentials
The skill requires no environment variables or credentials as declared, which is coherent for a descriptive spec. However, real implementations often require API keys or local model binaries; the SKILL.md does not request them or describe secure handling, so users should verify how the agent will implement model calls.
Persistence & Privilege
always:false and default model invocation settings are used. The skill does not request persistent presence or system-level configuration changes.
Scan Findings in Context
[no-findings] expected: Regex scanner found no code files to analyze. This is expected because the skill is instruction-only (SKILL.md).
Assessment
This skill is a high-level spec for an audio speech-to-text pipeline and appears coherent, but it does not implement anything by itself. Before installing or enabling: (1) confirm which runtime or service the agent will actually use (local model vs cloud provider); (2) if it uses third-party cloud APIs, expect to need API keys and verify how audio is uploaded and stored — the SKILL.md's privacy promise is descriptive but not enforceable; (3) verify accuracy claims (95% for Mandarin) against your expected audio conditions; and (4) ensure you have legal/consent coverage for processing any sensitive audio.

Like a lobster shell, security has layers — review code before you run it.

latestvk9782ej92vq5q47ehz2hpw2ch18423fs

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments