Pronunciation Coach

v1.0.4

Pronunciation coaching with real voice analysis using Azure Speech Services. Analyzes audio files for phoneme-level accuracy, fluency, prosody, and intonatio...

0· 537·1 current·1 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name, description, SKILL.md, scripts, and skill.json consistently describe using Azure Speech for pronunciation assessment and reading voice messages from ~/.openclaw/media/inbound/. This matches the capability. However, the top-level registry summary included with the evaluation stated 'Required env vars: none' while SKILL.md and skill.json clearly declare AZURE_SPEECH_KEY and AZURE_SPEECH_REGION as required — a metadata inconsistency that should be corrected.
Instruction Scope
The runtime instructions are narrowly scoped: locate latest .ogg files in ~/.openclaw/media/inbound/, convert to WAV via ffmpeg, call Azure Speech, and produce a human-readable report. These actions are consistent with the stated purpose. Notes: the SKILL.md instructs the agent to 'send a voice message (via TTS) demonstrating the correct pronunciation' and to 'send the text report to the user' but provides no code to perform Telegram messaging or TTS; those actions require the agent to have separate messaging/TTS capabilities or permissions not included in the skill files.
Install Mechanism
No install spec is provided (instruction- and script-only). This is low-risk from an installation perspective, but scripts will be executed directly by the agent environment and depend on ffmpeg and Node.js being present on PATH.
Credentials
Only Azure Speech credentials (AZURE_SPEECH_KEY, AZURE_SPEECH_REGION) are required by the scripts and skill.json; this is proportionate to the declared function. The earlier registry metadata that listed no required env vars is inconsistent with the skill's own manifest and SKILL.md. No other unrelated secrets are requested.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system settings. skill.json declares read permission for ~/.openclaw/media/inbound/ and outbound network access to *.stt.speech.microsoft.com, which are consistent with its behavior. Autonomous invocation is permitted (platform default) but not combined with other high-risk factors here.
Assessment
This skill appears to do what it claims: it will read audio files from ~/.openclaw/media/inbound/, convert them (ffmpeg), and upload them to Microsoft Azure Speech for pronunciation assessment. Before installing: 1) Confirm you are comfortable sending users' audio to Microsoft (privacy and billing/usage matters). 2) Provide an Azure Speech key and region via AZURE_SPEECH_KEY and AZURE_SPEECH_REGION. 3) Ensure ffmpeg and Node.js are available in the agent environment. 4) Note the SKILL.md suggests sending results back to users (text and TTS) but the skill does not implement Telegram messaging or TTS — you will need the agent or other skills to have those permissions. 5) Fix the registry metadata mismatch (it should declare required env vars) and verify the skill's source/homepage if provenance matters. If you need stronger assurance, review the scripts locally or run them in a sandboxed environment before granting access to real user audio or credentials.

Like a lobster shell, security has layers — review code before you run it.

latestvk97at1dka12v8c8bt0cs8yrf6181ew6d

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments