audio-transcribe-summarize
Transcribe audio/video files to text and generate structured summaries using SenseAudio ASR API. Use when the user asks to transcribe, summarize, or take not...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
Audio/Video Transcription & Summarization
Transcribe audio/video files using the SenseASR API (api.senseaudio.cn), then summarize the content into structured notes.
{baseDir} refers to this skill's directory.
Prerequisites
- Environment variable
SENSEAUDIO_API_KEYconfigured (get your key at https://senseaudio.cn/platform/api-key) - Python 3.8+ with
requestsinstalled - For large files (>10MB):
ffmpeginstalled for splitting(macOS:brew install ffmpeg,Windows: ffmpeg.org 下载并加入 PATH,Linux:apt install ffmpeg)
Quick Start
- Run the transcription script:
python {baseDir}/scripts/transcribe.py <audio_file> [--model sense-asr-pro] [--language zh] [--speakers] [--sentiment] [--translate en]
- The script outputs a transcript
.txtfile alongside the source file - Read the transcript and generate a summary (see Summary Format below)
Workflow
Step 1: Assess the Audio File
Check file size and format:
- Supported formats: wav, mp3, ogg, flac, aac, m4a, mp4
- Max file size per request: 10MB
- If file > 10MB, the script auto-splits using ffmpeg
Step 2: Choose the Right Model
| Model | Use When |
|---|---|
sense-asr-lite | Quick batch transcription, simple audio, cost-sensitive |
sense-asr | General transcription, need speaker separation or timestamps |
sense-asr-pro | High accuracy needed: meetings, interviews, complex audio |
sense-asr-deepthink | Noisy audio, dialects, heavy jargon, speech-to-clean-text |
Default to sense-asr-pro for best quality.
Step 3: Transcribe
Run the transcription script. Key options:
# Basic transcription
python {baseDir}/scripts/transcribe.py recording.mp3
# Meeting with multiple speakers + emotion
python {baseDir}/scripts/transcribe.py meeting.wav \
--model sense-asr-pro \
--speakers --max-speakers 4 \
--sentiment \
--timestamps segment
# Transcribe and translate to English
python {baseDir}/scripts/transcribe.py lecture.mp3 \
--model sense-asr \
--translate en
Step 4: Summarize
After transcription, read the transcript file and produce a summary using the format below.
Summary Format
Generate summaries in this structure:
# [Title - inferred from content]
**Source**: filename.mp3
**Duration**: X min Y sec
**Date**: YYYY-MM-DD
**Speakers**: [if speaker diarization was used]
## Key Points
- Point 1
- Point 2
- ...
## Detailed Summary
[2-4 paragraph summary of the content organized by topic/chronology]
## Action Items
- [ ] Action item 1 (assigned to Speaker X, if applicable)
- [ ] Action item 2
## Notable Quotes
> "Direct quote from transcript" — Speaker X, [timestamp if available]
## Full Transcript
<details>
<summary>Click to expand full transcript</summary>
[Full transcript text here, with speaker labels and timestamps if available]
</details>
Adapt the template based on content type:
- Meeting: emphasize action items, decisions, speaker contributions
- Lecture/Talk: emphasize key concepts, learning points, structure
- Interview: emphasize Q&A pairs, key responses
- Podcast: emphasize topics discussed, interesting insights
API Reference
For full SenseASR API parameters and response formats, see api-reference.md.
Files
3 totalComments
Loading comments…
