Install
openclaw skills install faster-whisper-gpuHigh-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
openclaw skills install faster-whisper-gpuHigh-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.
# Install dependencies
pip install faster-whisper torch
# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3
# Specify language explicitly
python transcribe.py audio.mp3 --language pt
# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt
# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3
python transcribe.py <audio_file> [options]
Options:
--model {tiny,base,small,medium,large-v1,large-v2,large-v3}
Model size to use (default: base)
--language LANG Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
--format {txt,srt,json,vtt}
Output format (default: txt)
--output FILE Output file path (default: stdout)
--device {cuda,cpu} Device to use (default: cuda if available)
--compute_type {int8,int8_float16,int16,float16,float32}
Computation precision (default: float16)
--task {transcribe,translate}
Task: transcribe or translate to English (default: transcribe)
--vad_filter Enable voice activity detection filter
--vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF
VAD parameters as comma-separated values
--condition_on_previous_text
Condition on previous text (default: True)
--initial_prompt PROMPT
Initial prompt to guide transcription
--word_timestamps Include word-level timestamps (for SRT/JSON)
--hotwords WORDS Comma-separated hotwords to boost recognition
python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt
python transcribe.py japanese_audio.mp3 --task translate --format txt
python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps
python transcribe.py audio.mp3 --device cpu --compute_type int8
from faster_whisper import WhisperModel
# Load model
model = WhisperModel("base", device="cuda", compute_type="float16")
# Transcribe
segments, info = model.transcribe("audio.mp3", language="pt")
print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
| Model | Parameters | VRAM Required | Relative Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39 M | ~1 GB | ~32x | Basic |
| base | 74 M | ~1 GB | ~16x | Good |
| small | 244 M | ~2 GB | ~6x | Better |
| medium | 769 M | ~5 GB | ~2x | Great |
| large-v3 | 1550 M | ~10 GB | 1x | Best |
Benchmarks measured on NVIDIA RTX 4090
Faster Whisper supports 99 languages including:
pt)en)es)fr)de)it)ja)zh)ru)# Use smaller model
python transcribe.py audio.mp3 --model tiny
# Or use CPU
python transcribe.py audio.mp3 --device cpu
# Or reduce precision
python transcribe.py audio.mp3 --compute_type int8
Models are automatically downloaded on first use to ~/.cache/huggingface/hub/.
If behind a proxy, set:
export HF_HOME=/path/to/custom/cache
nvidia-smi during transcriptionContributions are welcome! Please:
MIT License - See LICENSE for details.
Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.
Made with ❤️ for the OpenClaw community