Telegram Voice To Voice Macos

v0.1.3

Telegram voice-to-voice for macOS Apple Silicon: transcribe inbound .ogg voice notes with yap (Speech.framework) and reply with Telegram voice notes via say+ffmpeg. Not compatible with Linux/Windows.

0· 1.6k·4 current·4 all-time
byFiberian@fiberian1981
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description align with what the skill asks for: yap for Speech.framework transcription, say + ffmpeg for TTS/encoding, and defaults for reading macOS locale. The included helper scripts implement transcription and TTS and operate on the documented ~/.openclaw media/workspace paths.
Instruction Scope
SKILL.md instructs the agent to read inbound .ogg files from ~/.openclaw/media/inbound and to write reply files under workspace paths; the helper scripts do the transcription and TTS but do not implement the per-user 'voice_state/telegram.json' preference logic described in SKILL.md (that state management is expected to be done by the agent). The instructions do not request secrets or contact unknown external endpoints — sending replies is delegated to the agent's message tool as expected.
Install Mechanism
No install spec (instruction-only plus two small shell scripts). Nothing downloads or executes remote code; risk from install-time actions is low.
Credentials
The skill requires no credentials or sensitive environment variables. It accesses files under the user's home (~/.openclaw/*) and the macOS system locale, which are proportionate to the described functionality.
Persistence & Privilege
The skill is not always-enabled and does not request elevated privileges, but it does write/read files in the user's home (~/.openclaw/workspace and voice_state paths). Autonomous invocation is allowed by default (normal for skills); this combined with file I/O is expected for the workflow but worth noticing.
Assessment
This skill appears to do exactly what it says: transcribe .ogg voice notes locally (yap) and produce Telegram voice notes via say+ffmpeg. Before installing, confirm you are on macOS Apple Silicon and that you trust the local 'yap' and 'ffmpeg' binaries you will provide. Understand the skill will read inbound .ogg files from ~/.openclaw/media/inbound and create TTS output in ~/.openclaw/workspace/voice_out and (per the SKILL.md) expects a per-user state file voice_state/telegram.json in the workspace — the provided scripts don't manage that state file, so the agent must handle toggling between voice/text. No network endpoints or credentials are requested by the skill itself.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

OSmacOS
Binsyap, ffmpeg, say, defaults
latestvk9769s77fttw22c6dc0a346frd810a50
1.6kdownloads
0stars
4versions
Updated 1mo ago
v0.1.3
MIT-0
macOS

Telegram voice-to-voice (macOS Apple Silicon only)

This is an OpenClaw skill.

Requirements

  • macOS on Apple Silicon.
  • yap CLI available in PATH (Speech.framework transcription).
  • ffmpeg available in PATH.

Compatibility note (important)

This skill is macOS-only (uses say + Speech.framework). The skill registry cannot enforce OS restrictions, so installing/running it on Linux/Windows will result in runtime failures.

Persistent reply mode (voice vs text)

Store a small per-user preference file in the workspace:

  • State file: voice_state/telegram.json
  • Key: Telegram sender user id (string)
  • Values:
    • "voice" (default): reply with a Telegram voice note
    • "text": reply with a single text message

If the file does not exist or the sender id is missing: assume "voice".

Toggle commands

If an inbound text message is exactly:

  • /audio off → set state to "text" and confirm with a short text reply.
  • /audio on → set state to "voice" and confirm with a short text reply.

Getting the inbound audio (.ogg)

Telegram voice notes often show up as <media:audio> in message text. OpenClaw saves the attachment to disk (typically .ogg) under:

  • ~/.openclaw/media/inbound/

Recommended approach:

  1. If the inbound message context includes an attachment path, use it.
  2. Otherwise, take the most recent *.ogg from ~/.openclaw/media/inbound/.

Transcription

Default locale: macOS system locale.

Optional env:

  • YAP_LOCALE — override the transcription locale (e.g. it-IT, en-US).

Preferred:

  • yap transcribe --locale "${YAP_LOCALE:-<system>}" <path.ogg>
    • If YAP_LOCALE is not set, the helper script will use the macOS system locale (from defaults read -g AppleLocale).

If transcription fails or is empty: ask the user to repeat or send text.

Helper script:

  • scripts/transcribe_telegram_ogg.sh [path.ogg]

Reply behavior

Mode: voice (default)

Voice default: SYSTEM (uses the current macOS system voice). You can override by passing a specific voice name to the helper script.

  1. Generate the reply text.
  2. Convert reply text to an OGG/Opus voice note using:
  • scripts/tts_telegram_voice.sh "<reply text>" [SYSTEM|VoiceName]

The script prints the generated .ogg path to stdout.

  1. Send the .ogg back to Telegram as a voice note (not a generic audio file):
  • use the message tool with asVoice: true and media: <path.ogg>
  • optionally set replyTo to thread the response

Notes:

  • Use SYSTEM to rely on the current macOS system voice (recommended).

Mode: text

Reply with a single text message:

  • Transcription: <...>
  • Reply: <...>

Comments

Loading comments...