Music Analysis
v3.0.2Analyze music/audio files locally without external APIs. Extract tempo, pocket/groove feel, pulse stability, swing proxy, section/repetition structure, key c...
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (local music/audio analysis) align with the included Python scripts and declared dependencies (librosa, numpy, ffmpeg/ffprobe). The code implements tempo, timbre, structure, instrument detection and optional lyric alignment via a local Whisper CLI/model which fits the stated purpose. Minor note: SKILL.md suggests using yt-dlp for fetching YouTube audio (network use) as an optional audio sourcing method; that is outside the 'no external APIs' claim but presented as an explicit optional workflow.
Instruction Scope
Runtime instructions and scripts operate on local audio files, run ffmpeg/ffprobe and (optionally) whisper-cli, and write analysis reports to disk if requested. The SKILL.md and scripts do not instruct reading unrelated system files, environment secrets, or posting data to remote endpoints. Whisper usage is optional and the code includes a fallback path if Whisper is missing.
Install Mechanism
There is no install spec and requirements.txt only lists librosa and numpy. The skill relies on system binaries (ffmpeg/ffprobe) and optionally a locally installed whisper-cli and model file; nothing in the repository pulls code from arbitrary URLs or writes external installers. This is a low-risk install posture.
Credentials
The skill declares no required environment variables or credentials. The code does reference concrete filesystem paths (a Homebrew whisper-cli path and a home-dir model path) but these are optional and not secrets. No credentials or sensitive environment access is requested.
Persistence & Privilege
The skill is user-invocable, not always-included, and does not modify other skills or agent-wide configuration. It runs as a normal local tool and does not request elevated or persistent privileges.
Assessment
This skill appears to do what it claims: offline analysis of audio files using librosa and local tools. Before installing or running: 1) ensure you trust and want any local binaries it calls (ffmpeg/ffprobe and optionally whisper-cli) because the scripts invoke them via subprocess; 2) the hardcoded whisper-cli path (/opt/homebrew/bin/whisper-cli) and the model path (~/.local/share/whisper-cpp/...) are mac- and home-directory-specific — if you don't have Whisper or the model, the code will skip it, but if you do, be aware Whisper models can be large; 3) the README suggests using yt-dlp to fetch YouTube audio — that will download data from the network if you follow that workflow; 4) no credentials are requested and there are no network callbacks in the code, but you should still review the included scripts yourself (they are present) before running them on sensitive data. If you want higher confidence, run the scripts in an isolated environment (temporary folder or container) and verify the absence/presence of whisper-cli and the model if you do not want transcription.Like a lobster shell, security has layers — review code before you run it.
latest
Music Analysis (Local, No External APIs)
Primary tool: a full listen that combines snapshot analysis, structure, groove, harmonic tension, temporal mood mapping, and optional Whisper lyric alignment into one report.
1. Full Listen — primary / recommended
python3 skills/music-analysis/scripts/listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/listen.py track.mp3 --json
python3 skills/music-analysis/scripts/listen.py track.mp3 --out report.txt
python3 skills/music-analysis/scripts/listen.py track.mp3 --json --out report.json
What it does in one pass:
- Snapshot analysis: tempo, pulse stability, swing proxy, key clarity, harmonic tension, timbre, structure
- Whisper lyric transcription and filtering first — keep only real lyric text, drop artifact tags like
[MUSIC] - Temporal listen: windowed energy / mood / tension journey
- Synthesis layer that aligns lyrics with peak / tension / quiet windows and lets the lyric layer override the final vibe when confidence is high
Human-readable output structure
- SNAPSHOT
- groove/pocket
- structure summary + repeated sections
- harmony (key clarity + tension)
- timbre descriptor tags
- INSTRUMENT READ
- likely instrument palette (strong/likely/possible confidence)
- per-section instrument entrances and exits
- how instruments color the emotional feel
- written as natural language, not clinical data
- TEMPORAL JOURNEY
- opening / middle / closing mood-energy-tension read
- peak / quietest / tensest moments
- mood journey and transition count
- EMOTIONAL READ
- explainable emotion summary based on measured features
- LYRICS
- Whisper segment count
- excerpt or graceful skip note
- SYNTHESIS
- lyric-energy/tension alignment
- peak / tension / quiet lyric moments
- ALIGNED TIMELINE
- per-window moments where transitions / lyrics / tension spikes occur
2. Snapshot Analysis — standalone
python3 skills/music-analysis/scripts/analyze_music.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/analyze_music.py track.mp3 --json
Reports:
- tempo / pulse stability / pulse confidence / swing proxy / pocket
- key estimate / key clarity / chroma entropy / harmonic change / tonal motion / tension
- timbre descriptors (brightness, richness, low-end, contrast, dynamic range)
- section labels (A/B/C...) and repeated material detection
- explainable emotional read with reasons
3. Temporal Listen — standalone
python3 skills/music-analysis/scripts/temporal_listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/temporal_listen.py track.mp3 --json
Reports:
- sliding-window timeline (4s windows, 2s hops)
- energy contour
- mood labels
- harmonic tension + tonal motion
- transition types (drop hits, pulls back, tightens harmonically, shifts color, evolves)
- narrative arc (mountain / ascending / descending / plateau / wave)
Interpretation rules
- Structure labels are similarity labels, not verse/chorus claims.
- Swing proxy is a feel estimate, not drummer-grade microtiming truth.
- Emotion is explainable, derived from pulse + timbre + harmonic tension rather than a black-box mood guess.
- Lyrics can override the final vibe when filtered Whisper text is confident and emotionally clear.
Audio sourcing
The tool needs a real audio file on disk.
- Direct file (mp3, wav, flac, ogg, m4a — anything ffmpeg/librosa can read)
- YouTube / supported URLs:
yt-dlp -x --audio-format mp3 -o "output.mp3" "URL_OR_SEARCH"
Whisper lyrics transcription
listen.py uses:
- CLI:
/opt/homebrew/bin/whisper-cli - Model:
~/.local/share/whisper-cpp/ggml-large-v3-turbo.bin - Preprocess: convert input to mono 16kHz WAV via ffmpeg
- Fallback: skip gracefully if Whisper is missing or errors
Dependencies
Python:
- librosa
- numpy
System:
- ffmpeg
- ffprobe
Workspace hygiene
- Keep temporary audio files in a dedicated temp/output folder for the skill.
- Avoid modifying unrelated project files while working on audio analysis tasks.
Comments
Loading comments...
