Install
openclaw skills install microsoft-edge-ttsUse Microsoft Edge online TTS service to convert text to speech. Supports command line and module invocation, no API key required.
openclaw skills install microsoft-edge-ttsUse Microsoft Edge's online TTS service to convert text to speech without requiring an API key. Use this skill when users need to convert text to speech, generate audio files, or read content aloud.
Trigger this skill when the user mentions any of the following keywords:
# Basic usage
npx node-edge-tts -t 'Hello World'
# Specify output file
npx node-edge-tts -t 'Hello World' -f './output.mp3'
# Specify voice and language
npx node-edge-tts -t 'Hello world' -v 'en-US-AriaNeural' -l 'en-US'
# Adjust speaking rate and pitch
npx node-edge-tts -t 'Hello World' -r '+10%' --pitch '-5%'
# Generate subtitle file
npx node-edge-tts -t 'Hello World' -s
const { EdgeTTS } = require('node-edge-tts')
// or
import { EdgeTTS } from 'node-edge-tts'
const tts = new EdgeTTS()
await tts.ttsPromise('Hello World', './output.mp3')
| Parameter | Short | Description | Default |
|---|---|---|---|
--text | -t | Text to convert (required) | - |
--filepath | -f | Output file path | ./output.mp3 |
--voice | -v | Voice name | zh-CN-XiaoyiNeural |
--lang | -l | Language code | zh-CN |
--outputFormat | -o | Output format | audio-24khz-48kbitrate-mono-mp3 |
--rate | -r | Speaking rate | default |
--pitch | Pitch | default | |
--volume | Volume | default | |
--saveSubtitles | -s | Save subtitles | false |
--proxy | -p | Proxy settings | - |
--timeout | Timeout (ms) | 10000 |
const tts = new EdgeTTS({
voice: 'zh-CN-XiaoxiaoNeural',
lang: 'zh-CN',
outputFormat: 'audio-24khz-96kbitrate-mono-mp3',
saveSubtitles: true,
proxy: 'http://localhost:7890',
pitch: '-10%',
rate: '+10%',
volume: '-50%',
timeout: 10000
})
await tts.ttsPromise('Text to convert', './output.mp3')
zh-CN-XiaoyiNeural, zh-CN-XiaoxiaoNeural, zh-CN-YunjianNeural, zh-CN-YunxiNeural, zh-CN-YunxiaNeuralen-US-AriaNeural, en-US-GuyNeural, en-US-JennyNeuralja-JP-KeitaNeural, ja-JP-NanamiNeuralEnabling -s generates a .json subtitle file with the same name:
[
{ "part": "Hello", "start": 100, "end": 500 },
{ "part": "World", "start": 500, "end": 900 }
]
Time units are in milliseconds, part is the text segment.
npx node-edge-tts -t 'Welcome to speech synthesis'
For very long texts, it is recommended to process in segments and then merge.
# Chinese
npx node-edge-tts -t 'Hello World' -v 'zh-CN-XiaoxiaoNeural'
# English
npx node-edge-tts -t 'Hello World' -v 'en-US-AriaNeural'
-p parameter