Music Analysis

v1.0.0

Analyze music/audio files locally without external APIs. Extract tempo, pocket/groove feel, pulse stability, swing proxy, section/repetition structure, key c...

⭐ 0· 88·0 current·0 all-time

by@dahuangfortoby

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dahuangfortoby/music-analysis-84.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Music Analysis" (dahuangfortoby/music-analysis-84) from ClawHub.
Skill page: https://clawhub.ai/dahuangfortoby/music-analysis-84
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install music-analysis-84

ClawHub CLI

Package manager switcher

npx clawhub@latest install music-analysis-84

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (local music/audio analysis) matches the code: Python scripts use librosa, numpy, ffmpeg/ffprobe and optionally Whisper for transcription. That is coherent. Notes of concern: the setup.sh uses a hard-coded SKILL_DIR (/Users/huang/.openclaw/workspace/skills/music-analysis) and writes aliases to ~/.zshrc; those filesystem targets are specific and not declared in metadata. The SKILL.md suggests using yt-dlp for sourcing audio (external downloads) though yt-dlp is not installed by the skill — this is reasonable but should be explicit.

ℹ

Instruction Scope

SKILL.md and scripts operate on local audio files and call ffmpeg/ffprobe and an optional whisper-cli; they do not contain code that exfiltrates analysis to external endpoints. Whisper usage and a model file are expected to be local; however SKILL.md and setup.sh instruct downloading large model binaries and suggest using yt-dlp to fetch YouTube audio (network actions). The scripts also modify/inspect user files (via setup.sh) and create temp files. No instructions ask for unrelated system credentials or to read arbitrary unrelated files, but the install script modifies shell config.

ℹ

Install Mechanism

There is no registry install spec, but an included setup.sh performs actions: creates a venv, pip installs packages from PyPI, and curl-downloads a ~1.5GB Whisper binary from huggingface.co (a well-known host). Downloading a large prebuilt model is higher-risk than pure pip installs but the URL points to an expected Whisper/C++ model. The script expects brew-managed whisper-cli/ffmpeg or instructs how to install them. No obfuscated or shortener URLs found.

ℹ

Credentials

The skill does not request environment variables or credentials. It does rely on certain filesystem locations (WHISPER model under ~/.local/share/whisper-cpp and the hard-coded SKILL_DIR) and presence of ffmpeg/whisper-cli. Because it writes aliases to ~/.zshrc and assumes a specific /Users/huang path, the requested filesystem access is broader than strictly necessary for a generic, relocatable skill.

Persistence & Privilege

The skill is not force-enabled (always: false), but setup.sh modifies the user's ~/.zshrc to add aliases and creates a venv at a hard-coded path. Those actions change the user's shell environment and drop files into the home/workspace; they increase persistence and have side effects beyond simply running analysis. This is legitimate for a local developer convenience script but should be flagged to users who expect non-invasive install behavior.

What to consider before installing

This skill appears to implement the advertised local audio analysis (librosa-based features, temporal analysis, optional Whisper transcription). However: 1) Inspect setup.sh before running — it will create a virtualenv, pip install packages, download a ~1.5GB Whisper model from huggingface.co, and append aliases to ~/.zshrc; the script uses a hard-coded SKILL_DIR (/Users/huang/...), so it will likely fail or write unexpected files on your machine unless you edit it. 2) If you don't want the aliases or files in your home, do not run setup.sh as-is; instead create a venv manually, pip install requirements.txt, and run the scripts from a controlled directory. 3) The tool invokes ffmpeg/ffprobe and whisper-cli (optional); ensure you trust these binaries and verify their provenance. 4) The skill does not request credentials, nor does it clearly exfiltrate data, but it does perform network downloads during setup (model binary) and suggests using yt-dlp for audio sourcing. Recommendation: treat the package as useful but potentially intrusive — run it in a dedicated environment or container, review and adapt setup.sh (remove or fix hard-coded paths and ~/.zshrc modifications), and verify external downloads before executing them.

Like a lobster shell, security has layers — review code before you run it.

latestvk97d8q2efhw1rf39ap4800eqf5844ps1

88downloads

0stars

1versions

Updated 3w ago

v1.0.0

MIT-0

Music Analysis (Local, No External APIs)

Primary tool: a full listen that combines snapshot analysis, structure, groove, harmonic tension, temporal mood mapping, and optional Whisper lyric alignment into one report.

1. Full Listen — primary / recommended

python3 skills/music-analysis/scripts/listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/listen.py track.mp3 --json
python3 skills/music-analysis/scripts/listen.py track.mp3 --out report.txt
python3 skills/music-analysis/scripts/listen.py track.mp3 --json --out report.json

What it does in one pass:

Snapshot analysis: tempo, pulse stability, swing proxy, key clarity, harmonic tension, timbre, structure
Whisper lyric transcription and filtering first — keep only real lyric text, drop artifact tags like [MUSIC]
Temporal listen: windowed energy / mood / tension journey
Synthesis layer that aligns lyrics with peak / tension / quiet windows and lets the lyric layer override the final vibe when confidence is high

Human-readable output structure

SNAPSHOT
- groove/pocket
- structure summary + repeated sections
- harmony (key clarity + tension)
- timbre descriptor tags
INSTRUMENT READ
- likely instrument palette (strong/likely/possible confidence)
- per-section instrument entrances and exits
- how instruments color the emotional feel
- written as natural language, not clinical data
TEMPORAL JOURNEY
- opening / middle / closing mood-energy-tension read
- peak / quietest / tensest moments
- mood journey and transition count
EMOTIONAL READ
- explainable emotion summary based on measured features
LYRICS
- Whisper segment count
- excerpt or graceful skip note
SYNTHESIS
- lyric-energy/tension alignment
- peak / tension / quiet lyric moments
ALIGNED TIMELINE
- per-window moments where transitions / lyrics / tension spikes occur

2. Snapshot Analysis — standalone

python3 skills/music-analysis/scripts/analyze_music.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/analyze_music.py track.mp3 --json

Reports:

tempo / pulse stability / pulse confidence / swing proxy / pocket
key estimate / key clarity / chroma entropy / harmonic change / tonal motion / tension
timbre descriptors (brightness, richness, low-end, contrast, dynamic range)
section labels (A/B/C...) and repeated material detection
explainable emotional read with reasons

3. Temporal Listen — standalone

python3 skills/music-analysis/scripts/temporal_listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/temporal_listen.py track.mp3 --json

Reports:

sliding-window timeline (4s windows, 2s hops)
energy contour
mood labels
harmonic tension + tonal motion
transition types (drop hits, pulls back, tightens harmonically, shifts color, evolves)
narrative arc (mountain / ascending / descending / plateau / wave)

Interpretation rules

Structure labels are similarity labels, not verse/chorus claims.
Swing proxy is a feel estimate, not drummer-grade microtiming truth.
Emotion is explainable, derived from pulse + timbre + harmonic tension rather than a black-box mood guess.
Lyrics can override the final vibe when filtered Whisper text is confident and emotionally clear.

Audio sourcing

The tool needs a real audio file on disk.

Direct file (mp3, wav, flac, ogg, m4a — anything ffmpeg/librosa can read)
YouTube / supported URLs: yt-dlp -x --audio-format mp3 -o "output.mp3" "URL_OR_SEARCH"

Whisper lyrics transcription

listen.py uses:

CLI: /opt/homebrew/bin/whisper-cli
Model: ~/.local/share/whisper-cpp/ggml-large-v3-turbo.bin
Preprocess: convert input to mono 16kHz WAV via ffmpeg
Fallback: skip gracefully if Whisper is missing or errors

Dependencies

Python:

librosa
numpy

System:

ffmpeg
ffprobe

Workspace hygiene

Keep temporary audio files in a dedicated temp/output folder for the skill.
Avoid modifying unrelated project files while working on audio analysis tasks.

Comments

Loading comments...