article-tts

v2.1.0

拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TT...

1· 183·0 current·0 all-time
by退役前写代码的@54meteor

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for 54meteor/article-tts.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "article-tts" (54meteor/article-tts) from ClawHub.
Skill page: https://clawhub.ai/54meteor/article-tts
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install article-tts

ClawHub CLI

Package manager switcher

npx clawhub@latest install article-tts
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description (image OCR + Edge TTS) matches the declared runtime steps and required tools: Tesseract for OCR, Python + Pillow for image preprocessing, and uvx/edge-tts for TTS. Requiring tessdata and language packs for Chinese OCR is expected. No unexplained external credentials or unrelated binaries are requested.
Instruction Scope
Instructions stay within the stated task: preprocess image, run tesseract, produce text, optionally split into sentences, and call edge-tts via uvx. Two things to note: (1) skipConfirmation is explicitly warned as a privacy risk because it will convert OCR output (which may contain sensitive data) directly to audio; (2) the doc includes examples that run other scripts by absolute path (e.g., a feishu-voice-send script under /mnt/d/wslspace/...), which assumes local files/skills exist and could execute arbitrary code if present. The skill does not request extra env vars and relies on OpenClaw's message(...) tool for channel delivery.
Install Mechanism
This is instruction-only (no packaged install). The SKILL.md suggests apt-get to install tesseract and language packs (standard). It also relies on uvx auto-downloading edge-tts on first run — an implicit network fetch of code at runtime. That auto-download is reasonable for convenience but is a higher-risk action than purely using already-installed binaries because it pulls remote package(s) dynamically.
Credentials
No environment variables or credentials are requested; the skill defers to OpenClaw's channel authentication. This is proportionate for a messaging-forwarding TTS skill. Caveat: forwarding via other skills (e.g., feishu-voice-send) may require credentials/configuration outside this skill.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It is a runtime instruction-only skill and does not modify other skills or require persistent configuration changes.
Assessment
This skill appears to do what it says: OCR images or accept text, then produce Edge TTS audio and send it over the active channel. Before installing/using it: 1) Be cautious with skipConfirmation — don’t enable it for images that may contain private data. 2) Expect an apt-get step to install tesseract and a first-run network download (uvx will auto-fetch edge-tts); if you need stricter supply-chain control, preinstall and vet the edge-tts package source. 3) The docs include absolute example paths and an example call to another skill/script (feishu-voice-send) — verify those scripts exist and review them before executing. 4) Run the skill in a sandbox or test environment first if you are uncertain about auto-downloaded components. If you want a stronger assurance, ask the publisher for explicit sources/URLs for uvx/edge-tts and the feishu helper script, or request a packaged release rather than instruction-only steps.

Like a lobster shell, security has layers — review code before you run it.

latestvk97f8n4f2avgq2cc8hjdbjf67984b8sf
183downloads
1stars
8versions
Updated 3w ago
v2.1.0
MIT-0

Article TTS Skill

Default Configuration

参数默认值说明
langen语言:enzh
skipConfirmationfalse是否跳过文字确认步骤
speed90%TTS 语速(--rate=-10% = 90%)
voiceen-US-EmmaNeural(英文)/ zh-CN-XiaoxiaoNeural(中文)TTS 声音
splitSentencesfalse是否生成按句拆分的音频

Supported Languages

语言OCR 语言包TTS Voice
eneng(预装)en-US-EmmaNeural
zhchi_sim(需安装)zh-CN-XiaoxiaoNeural

中文 OCR 语言包安装:

  • Linux(WSL/Debian/Ubuntu):apt-get install tesseract-ocr-chi-sim
  • macOS:brew install tesseract-lang(自带中文)
  • Windows:下载 chi_sim.traineddata 放入 Tesseract 安装目录的 tessdata 文件夹

Workflow

Input Types

  • 图片:OCR 提取文字(需要 lang 指定语言)
  • 纯文字:直接 TTS,无需 OCR

Standard Flow(默认,需确认)

图片 → OCR 提取文字 → 展示给用户确认 → 用户确认 → 生成 TTS → 发送
文字 → 直接生成 TTS → 发送

Skip-Confirmation Flow ⚠️

用户说"不需要确认"或"直接生成"时,跳过确认步骤。

⚠️ 安全提示:skipConfirmation 会跳过文字确认步骤,OCR 提取的文本(可能包含敏感信息)会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭(skipConfirmation: false)。

OCR Step

# 图片预处理
from PIL import Image, ImageOps
img = Image.open(image_path)
img = ImageOps.autocontrast(img.convert('L'), cutoff=10)
w, h = img.size
img = img.resize((w*4, h*4), Image.LANCZOS)
img.save('/tmp/ocr_input.jpg', quality=99)
# 英文
tesseract /tmp/ocr_input.jpg stdout -l eng --psm 4

# 中文
tesseract /tmp/ocr_input.jpg stdout -l chi_sim --psm 4

TTS Step

全文字频

uvx edge-tts \
  -t "FULL TEXT" \
  -v en-US-EmmaNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/full_article.mp3

# 中文
uvx edge-tts \
  -t "中文文字内容" \
  -v zh-CN-XiaoxiaoNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/full_article.mp3

按句拆分(仅 splitSentences=true)

import subprocess, re

def split_sentences(text, lang='en'):
    if lang == 'zh':
        # 中文按句号/感叹号/问号拆分
        sentences = re.split(r'(?<=[。!?])\s*', text)
    else:
        # 英文按 .!? 拆分
        sentences = re.split(r'(?<=[.!?])\s+', text)
    return [s.strip() for s in sentences if s.strip()]

sentences = split_sentences(text, lang=lang)
for i, sentence in enumerate(sentences, 1):
    num = str(i).zfill(2)
    voice = 'zh-CN-XiaoxiaoNeural' if lang == 'zh' else 'en-US-EmmaNeural'
    subprocess.run([
        "uvx", "edge-tts",
        "-t", sentence,
        "-v", voice,
        "--rate=-10%",
        "--write-media", f"OUTPUT_DIR/sentence_{num}.mp3"
    ])

Output Directory

/mnt/d/wslspace/workspace/articles/YYYY-MM-DD-article-slug/
├── original_text.md
├── full_article.mp3
└── sentence_01.mp3 ...

Sending via Message Channel

The agent detects the active channel from the runtime context and calls message(...) accordingly. No hardcoded channel — the agent uses whichever channel the user is currently chatting through.

# Detect active channel automatically (from runtime inbound metadata)
# channel is inferred: feishu / telegram / discord / whatsapp / signal / imessage / openclaw-weixin

# 发送全文
message(action="send", channel="{active_channel}",
        message="📄 全文音频",
        media="PATH/full_article.mp3",
        filename="full_article.mp3")

# 发送每句
for i, sentence in enumerate(sentences, 1):
    num = str(i).zfill(2)
    message(action="send", channel="{active_channel}",
            message=f"📝 {num}: {sentence}",
            media=f"PATH/sentence_{num}.mp3",
            filename=f"sentence_{num}.mp3")

Channel Behavior Notes

Channel音频支持备注
Feishu推荐使用 feishu-voice-send skill 发送语音消息
Telegram直接发送 mp3
Discord作为附件发送
WhatsApp直接发送 mp3
Signal⚠️取决于信号强度,可能不支持
iMessage⚠️通过 macOS 发送,mp3 兼容性一般
WeChat Work同 Feishu

If the channel does not support audio, the agent saves the file to OUTPUT_DIR and sends the file path as a text message instead.


如何发送为语音消息(而非附件)

重要说明: OpenClaw 内置的飞书媒体发送存在 bug(缺少 duration 参数),导致 .ogg 文件有时显示为附件而非语音消息。

推荐方案:使用 feishu-voice-send skill

该 skill 调用飞书官方 API,正确传递 duration 参数,确保语音消息正常显示。

方式一:通过 feishu-voice-send skill 发送

# 发送现有的 .ogg 文件
python3 /mnt/d/wslspace/workspace/skills/feishu-voice-send/scripts/send_voice.py \
    /path/to/audio.ogg \
    <接收者open_id>

# 或直接生成 TTS 并发送
python3 /mnt/d/wslspace/workspace/skills/feishu-voice-send/scripts/tts_and_send.py \
    "要转换的文字" \
    <接收者open_id> \
    -v zh-CN-YunjianNeural \
    -r -10%

方式二:手动调用(不推荐)

如果必须使用 OpenClaw 内置的 message 工具,需要:

  1. 将 mp3 转换为标准 Ogg Opus 格式
  2. 发送时必须带 message 参数
  3. 注意:即使带 message 参数,仍可能因为缺少 duration 而显示为附件
# 1. 用 edge-tts 生成 mp3
uvx edge-tts \
  -t "Your text here" \
  -v en-US-EmmaNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/voice.mp3

# 2. 用 ffmpeg 转换为标准 Ogg Opus
ffmpeg -i OUTPUT_DIR/voice.mp3 \
  -c:a libopus \
  -b:a 32k \
  -ar 24000 \
  -ac 1 \
  OUTPUT_DIR/voice.ogg

# 3. 使用 message 工具发送(仍可能显示为附件)
message(action="send", channel="feishu", \
        message="📄 语音", \
        media="OUTPUT_DIR/voice.ogg")

Available TTS Voices

English

en-US-EmmaNeural, en-US-BrianNeural, en-GB-LibbyNeural, ...

Chinese

zh-CN-XiaoxiaoNeural(女声), zh-CN-YunxiNeural(男声), zh-CN-YunyangNeural(新闻男声), ...

查看完整列表:uvx edge-tts -l | grep "zh-CN"

Notes

  • Tesseract + English 预装;中文需 apt-get install tesseract-ocr-chi-sim
  • edge-tts 通过 uvx 运行,无需安装
  • 图片质量直接影响 OCR 效果,尽量保持光线充足、角度端正

Comments

Loading comments...