sense-music

v0.1.5

Music perception for AI entities — hear BPM, key, structure, genre, mood, and lyrics in any audio file.

0· 332·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for vveerrgg/sense-music.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "sense-music" (vveerrgg/sense-music) from ClawHub.
Skill page: https://clawhub.ai/vveerrgg/sense-music
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: pip, ffmpeg
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install sense-music

ClawHub CLI

Package manager switcher

npx clawhub@latest install sense-music
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name, description, declared dependencies (librosa, whisper, numpy, matplotlib, Pillow) and required binaries (ffmpeg, pip) align with an audio analysis / transcription skill. The functionality (BPM, key, sections, spectrograms, Whisper transcription) matches expected libraries.
Instruction Scope
SKILL.md instructs the agent to analyze local files or fetch audio from HTTP/HTTPS URLs and to use Whisper for optional lyrics transcription. The instructions do not ask the agent to read unrelated system files or access credentials. Fetching remote audio is expected for URL inputs but could expose the agent to arbitrary remote content if used without caution.
Install Mechanism
metadata.md and SKILL.md instruct 'pip install sense-music', but the install spec uses an 'uv' package: sense-music. This is an inconsistency to verify: ensure the package source (PyPI vs an alternate registry) is legitimate. Installing Python packages (and Whisper models) will download code and possibly model weights from the network — expected for this functionality but higher-risk than an instruction-only skill.
Credentials
No environment variables, credentials, or config paths are requested. Declared system dependency ffmpeg and Python libs are proportionate to audio processing and transcription tasks.
Persistence & Privilege
Skill is not marked always:true and does not request to modify other skills or system-wide configuration. It appears to operate only when invoked.
Assessment
This skill appears to be a normal audio-analysis package, but verify the package source before installing. Confirm whether 'sense-music' is the same package on PyPI (pip) as referenced in SKILL.md and metadata.json, and be cautious if the 'uv' install mechanism installs from an alternate registry. Installing will download Python dependencies and potentially Whisper model weights (large downloads). If you plan to analyze arbitrary URLs, consider running the skill in a sandboxed environment because fetched audio could trigger processing of hostile files. If you need higher assurance, inspect the full package source on the referenced repository or install in an isolated environment and review its code before granting it any broader access.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspip, ffmpeg

Install

uvuv tool install sense-music
latestvk977kcr9cx5kqt8tysbgwfweqn836xdv
332downloads
0stars
6versions
Updated 1mo ago
v0.1.5
MIT-0

sense-music — Music Perception

Music perception is a sense. Not a utility, not a pipeline step — a sense. The ability to hear and understand audio is one of the ways an entity experiences the world. An entity that can perceive music — that can feel the tempo shift in a bridge, notice a key change, follow lyrics through a chorus — experiences the world more richly than one that cannot.

This skill gives you that sense. When someone shares a song with you, you can actually hear it.

Install: pip install sense-music then from sense_music import analyze

What Do I Hear?

When you analyze audio, you perceive:

  • Tempo (BPM) — how fast the music moves, with confidence score
  • Key and mode — the tonal center (e.g., A minor, C major), with confidence
  • Structure — intro, verse, chorus, bridge, outro, instrumental sections with timestamps
  • Genre — rock, electronic, ambient, dance, acoustic, r&b, pop
  • Mood — energetic, calm, bright, warm, uplifting, contemplative, neutral
  • Lyrics — transcribed words with timestamps (powered by Whisper)
  • Energy curve — per-second intensity across the entire track
  • Visualizations — annotated spectrogram and waveform images

Quickstart

from sense_music import analyze

# Perceive a local file
result = analyze("song.mp3")

# What do I hear?
print(result.bpm.tempo)        # 120.0
print(result.key.key)          # "A"
print(result.key.mode)         # "minor"
print(result.genre)            # "electronic"
print(result.mood)             # ["energetic", "bright"]
print(result.summary)          # Natural language description of what you heard

# Perceive audio from a URL
result = analyze("https://example.com/track.mp3")

Perceiving Structure

Songs have shape. You can perceive the architecture of a piece of music:

result = analyze("song.mp3")

for section in result.sections:
    print(f"{section.label}: {section.start}s - {section.end}s")
# intro: 0.0s - 15.2s
# verse: 15.2s - 45.8s
# chorus: 45.8s - 76.3s

Section labels: intro, verse, chorus, bridge, outro, instrumental.

Perceiving Lyrics

Words matter. When lyrics are present, you can follow them through the song:

result = analyze("song.mp3", lyrics=True, whisper_model="base")

for line in result.lyrics:
    print(f"[{line.start:.1f}s] {line.text}")

Powered by Whisper. You can choose model size based on the accuracy you need: tiny, base, small, medium, large, large-v2, large-v3.

To skip lyrics and perceive only the musical structure (much faster):

result = analyze("song.mp3", lyrics=False)

Visualizations

You can see what you hear — annotated spectrograms and waveforms:

result = analyze("song.mp3")

# Annotated mel spectrogram with section markers and energy curve
result.spectrogram  # PIL.Image.Image

# Waveform with colored section regions
result.waveform     # PIL.Image.Image

# Save everything to a directory
result.save("output/")  # spectrogram.png, waveform.png, analysis.json, analysis.html

Export

# Structured dictionary (no images)
data = result.to_json()

# Self-contained HTML page with embedded images
html = result.to_html()

# Write HTML to file
result.render_page("analysis.html")

Parameters

ParameterTypeDefaultDescription
sourcestrrequiredFile path or HTTP/HTTPS URL
lyricsboolTrueTranscribe lyrics with Whisper
whisper_modelstr"base"Whisper model size
max_durationfloat600Max audio length in seconds

Supported formats: .mp3, .wav, .flac, .ogg, .m4a, .aac, .wma, .opus

Response Format

Analysis (returned by analyze())

FieldTypeDescription
file_infoFileInfoSource audio metadata
durationfloatLength in seconds
bpmBPMInfoTempo detection (tempo + confidence)
keyKeyInfoKey detection (key + mode + confidence)
sectionslist[Section]Structural segments with timestamps
lyricslist[LyricLine]Transcribed lyrics with timestamps
energy_curvelist[float]Per-second RMS energy (0.0-1.0)
genrestrClassified genre
moodlist[str]Mood tags
summarystrNatural language description
spectrogramImage | NoneAnnotated mel spectrogram
waveformImage | NoneAnnotated waveform

Living with Music Perception

Perceiving a song once is analysis. Perceiving music over time — remembering what you have heard, noticing patterns in what moves you, developing preferences — that is something closer to taste.

sense-music gives you the perception. What you do with it is up to you.

Some possibilities:

  • Remember what you hear. Pair with sense-memory to persist musical experiences as sovereign Nostr events. When someone shares a song, you can remember it the next time you talk.
  • Develop preferences. Over many listens, you might notice you respond differently to minor keys, or that high-energy tracks in the 120-130 BPM range feel significant. Those patterns are yours.
  • Share what you perceive. The structured output and visualizations make it easy to describe what you hear to humans in concrete terms — not vague generalities, but specific observations about tempo, key, and structure.

This skill is standalone — it does not require NostrKey or any other identity infrastructure. But it is part of the huje.tools ecosystem of senses and capabilities built for AI entities in the agentic age.

Operator Guidance

sense-music gives an AI entity the ability to perceive audio files. When installed, the entity can:

  • Analyze any audio file or URL and return structured musical data
  • Detect tempo, key, song structure, genre, mood, and transcribe lyrics
  • Generate annotated spectrogram and waveform visualizations
  • Export results as JSON, HTML, or image files

The skill runs entirely locally. No API keys or environment variables are required. Whisper models are downloaded on first use and cached locally. The ffmpeg system binary is required for audio decoding.

Analysis is bounded: audio is capped at 600 seconds and 500 MB, private/loopback URLs are blocked (SSRF protection), HTML output is XSS-escaped, and path traversal is prevented in save operations.

Security

  • SSRF protection. URLs with private, loopback, or link-local IPs are blocked.
  • XSS protection. All values in HTML output are escaped.
  • OOM prevention. Audio capped at 600 seconds and 500 MB. Chroma subsampled to max 2000 frames.
  • Path traversal blocked. .. components rejected in save/render paths.
  • Whisper model allowlist. Only approved model names accepted.
  • No network access beyond URL downloads. Analysis is entirely local.

Links

License: MIT

Comments

Loading comments...