Music Craft — MiniMax

Advanced music generation for OpenClaw, using the MiniMax Music 2.6 token plan. Use for cover and style transfer, two-song mashup, lyrics generation API, emotion-driven prompt engineering, and fine control via the mmx CLI. Extends music-craft with MiniMax-specific features.

LuisCharro@luischarro

Install

openclaw skills install @luischarro/music-craft-minimax

Music Craft — MiniMax

Cloud duration is approximate. Verified field run (2026-06-12): 18/18 cloud jobs saved an MP3, but output length ranged from 57-135% of requested duration (12 truncated, 3 close, 3 extended). Use music-craft with local ACE-Step when exact duration matters; use this skill when speed, MiniMax cover workflows, lyrics API, emotion analysis, or mmx flag control matter more.

This is the power-user upgrade of music-craft. It does everything that skill does, plus the features that require the MiniMax Music 2.6 token plan:

Cover and style transfer from a reference audio file (preserves melody)
Two-song mashup (Song A's content and emotion + Song B's style)
Lyrics generation via the MiniMax API endpoint (with edit mode for iteration)
Emotion analysis on input audio to drive prompt construction (vocal speed, intensity curve, pitch bends)
Fine control over generation parameters (BPM, key, structure, avoid list as separate flags via mmx)

For everything else (standard song generation, instrumentation, anti-sparse prompt engineering, structure tags, user preference flow), this skill uses the same workflow as music-craft. Read that skill first to understand the base, then come back here for the MiniMax-specific extensions.

Data, Consent, and Local Side Effects

This skill is allowed to process music media, but it should not surprise the user:

Cloud generation: prompts, lyrics, reference audio, and cover/mashup inputs may be sent to MiniMax through mmx or MiniMax API calls. Confirm consent before sending sensitive, private, or third-party-owned audio.
Local files only: audio input must be a local file path. URLs are not accepted in v1.5.0+; use the private music-source-fetch skill to fetch audio by title first.
Lyrics: Whisper transcription of the local audio file is the only lyrics source in this skill. LRCLib web lookup moved to music-source-fetch.
Local outputs: analysis JSON, lyrics, prompts, temporary media, and generated audio are written to user-selected paths or temporary directories.
Overwrites: helper scripts refuse to replace existing user-visible outputs unless the operator passes an explicit --overwrite flag.

There is no image, face, OCR, or VLM pipeline in this skill. Album art, video frames, and similar visual inputs are not analyzed.

Required MiniMax Workflow

Operator rules for MiniMax generations:

Run MiniMax generations sequentially, not in parallel.

Do not assume the CLI will honor a requested output path — pass --out to mmx and verify the file exists after each run.

Treat requested duration as a target, not a guarantee. Verified range: 57-135% of requested duration.

If mmx exits with SIGTERM/SIGKILL after saving, verify the file before rerunning; file existence is the source of truth.

Before running multi-output MiniMax generations, load references/minimax-generation-caveats.md.

For every cover, style-transfer, mashup, or precision mmx generation, follow this checklist in order:

Analyze source audio with scripts/analysis_orchestrator.py when audio is available.
Build M1 + M2 prompts from the analysis: M1 is the primary style, M2 is a strong contrast style.
Lint both prompts and lyrics with scripts/lint_music_request.py before generation. Use --lyrics-file when lyrics exist so invalid tags and duration density are caught. Stop on blockers.
Generate with retry via scripts/generate_with_retry.py --output-path <final.mp3> -- music generate ... --out <final.mp3> or -- music cover ... --out <final.mp3>. --output-path does not replace mmx --out.
Verify outputs: duration, LUFS/peak, file size, audible completeness, and lyrics alignment. When lyrics matter, listen or transcribe and use scripts/verify_lyrics_alignment.py before delivery.
Finalize delivery copy with scripts/finalize_track.sh input.mp3 output.mp3 when the user wants production-ready loudness.
Deliver both versions with a short analysis summary and any caveats.

Preserve the anti-sparse guard in prompts: fully arranged, instruments keep playing, no a cappella dropouts unless explicitly requested.

Reliability Caveats

Load references/minimax-generation-caveats.md before any multi-output, long-duration, or quota-sensitive cloud run.

--length is a hint, not a contract. Cloud output can undershoot or overshoot the request; local ACE-Step is the exact-duration fallback.
A shell SIGTERM/SIGKILL after save can still be a successful generation. Check file existence, file size, and ffprobe duration before retrying.
Keep standard cloud generation prompts under about 500 characters when possible. Put detailed production sheets in the local ACE-Step route.
Treat --references as optional and potentially flaky. Inline concise references in the prompt when reliability matters.
Expected cloud output is MP3, stereo, 44.1 kHz, about 256 kbps. There are no documented flags for FLAC, 48 kHz, or bitrate selection.
There is no documented MiniMax batch API or preview/draft mode. For repeated covers, use scripts/batch_cover.py so runs remain sequential and verified.
Quota usage may not be visible from the CLI. Check the MiniMax dashboard when planning large batches.

Short cloud prompt recipes: references/short-prompt-recipes.md.

Routing and Blocker Checks

Classify the request before analysis or generation:

Text-only style reference means the user gave a song name, artist, era, or genre cue without source audio. Treat it as style inference, not cover analysis.
Reference audio means the user provided a local file that should be analyzed. URLs are not accepted; ask for a local file path.
Cover preserves melody and usually needs a source file plus a target style decision.
Style transfer uses a reference track or analyzed audio as style input, then changes the production direction.
Mashup needs Song A and Song B, plus a decision about which one contributes content and which one contributes style.
Emotion prompt means the user wants analysis turned into descriptive prompt language, not a full cover.

The scripts/lint_music_request.py helper emits one of these routes:

Route	When
`base_prompt`	Standard generation, no MiniMax-specific feature needed.
`minimax_cover`	Melody-preserving cover from a local audio file.
`minimax_mashup`	Two-song mashup (A + B, both identified).
`minimax_style_transfer`	Style transfer that does not preserve the source melody.
`minimax_emotion_prompt`	Emotion analysis, or precision `mmx` flag usage.
`needs_clarification`	At least one blocker is unresolved; ask the user first.

Surface blockers before analysis:

no local source file (URLs are not accepted)
unclear which track is Song A versus Song B
missing target style
missing lyrics decision, such as original, translated, rewritten, or instrumental
conflicting cover/style-transfer intent: the user asked for both "cover" (preserve melody) and "style transfer" (reproduce style) at once. These are mutually exclusive. Ask the user to pick one.

After you have prompt text and mmx flags, lint them together before generation:

compare prompt BPM with --bpm
compare prompt key with --key
compare prompt structure line with --structure
compare prompt duration with --duration (or implicit length expectation)
compare prompt vocal mode with --vocals
compare prompt language with --language
compare prompt avoid language with --avoid
stop when the prompt says one thing and the flags say another
warn when prompt text exceeds 1800 UTF-8 bytes
stop when prompt text exceeds 2000 UTF-8 bytes (observed API rejection at 2079 bytes)

If the user only has a text reference, route to the free-tool path in references/free-tool-inputs.md first. If the user has audio, analyze first and only then build the prompt. The linter returns a retry_guidance array with one hint per conflict so the operator can re-align prompt and flags on the next attempt.

When To Use

Use this skill when the task involves:

generating a cover of an existing song with a different style (chanson version of a rock track, reggaeton version of a pop hit, and so on). Source must be a local file.
style transfer from a local audio file to a target genre
two-song mashup where Song A's lyrics and emotional arc are kept, but Song B's style is applied
emotion analysis on input audio to extract intensity curves, vocal speed, pitch bends, and emotion classifications
generating lyrics in a specific language and theme via the MiniMax lyrics_generation API
editing existing lyrics to match a target style or emotional arc (MiniMax lyrics_generation edit mode)
using mmx CLI directly for fine control over --avoid, --bpm, --key, --structure, --vocals, --instruments as separate flags
accessing MiniMax's music-cover or music-cover-free models for melody preservation

Request Intake (adapted for MiniMax features)

After the Routing and Blocker Checks classify the request, run this 2-pass intake to extract the full set of fields the user cares about. Label each field's confidence: clear (user said it), inferred (sensible default), missing (need to ask), or conflicting (user said two incompatible things — pause to resolve).

Fields checklist (MiniMax-specific)

#	Field	What to look for	MiniMax-specific notes
1	Route	Cover / style transfer / mashup / standard / emotion prompt	From the Routing and Blocker Checks section. Determines which MiniMax features to use.
2	Source audio	Local file path	Required for cover, mashup, style transfer. For standard, optional (text-only style reference is also fine).
3	Song A identity	Name, artist, audio	For mashup: needed. For cover: this is the source.
4	Song B identity	Name, artist, audio	For mashup only.
5	Target style	Genre / mood / reference	The destination of the cover or style transfer. If user says "like Rosalía", that's clear. If user says "something good", that's missing.
6	Lyrics decision	Original / translated / new / instrumental	For cover, default to original (translated if user requests it). For standard, default to new (or user-provided).
7	Vocal mode	Solo / duet / choir / instrumental	Drives `--vocals` and `--language` flags.
8	Language	BCP-47 code (en, fr, es, etc.)	For lyrics language AND vocal language.
9	Duration	Approximate length (jingle ~30s, standard ~3min, epic ~6min)	`--length` is a hint in milliseconds, not a guarantee. Length is still driven mainly by lyrics + structure.
10	BPM, key, structure	Exact values if user wants `--bpm`/`--key`/`--structure`	Optional. If provided, the prompt AND flags must agree (lint them).
11	Emotion arc	For emotion-prompt workflows: which emotions to emphasize	Drives the analysis-to-prompt translation.
12	Output location	Where the audio and analysis files go	Same as the base skill — per-song subfolder in `~/Music mix/<project>/<song-slug>/`.

Confidence map example: references/examples.md.

If any field is missing or conflicting, that's a question to ask. The Ambiguity Questions section below has specific patterns for each route. If everything is clear or inferred, the request is ready to translate.

User Preference Flow (message patterns → action)

The skill does not start with a questionnaire. It starts by reading and inferring from the user's natural-language request.

User says...	Skill does...
"Haz un cover de X en Y"	Route: `minimax_cover`. Ask: local source audio file path, target language for lyrics, vocal register.
"Make this song sound like Rosalía"	Route: `minimax_style_transfer`. Ask: source audio, which album/era of Rosalía.
"I have audio of A, mash with B, keep A's melody"	Route: `minimax_mashup`. Ask: A vs B confirmation, source audio for A, B can be name or audio.
"Analyze the emotion curve of this track"	Route: `minimax_emotion_prompt` (analysis-only). Run `analysis_orchestrator.py --audio` first, then read the JSON.
"I want the lyrics to be about X, in French, melancholic"	Route: `base_prompt` (standard). Use the lyrics API to generate, then pass to `mmx music generate --lyrics-file`. Ask: target BPM/key/structure or derive from analysis.
"Recreate the song but in 90 BPM D minor"	Route: `base_prompt` with `mmx` flags. Lint prompt vs flags before generation. Verify BPM/key consistency.
"I don't know, surprise me"	Pick a coherent default (e.g. upbeat indie pop, EN, ~3min, auto-lyrics, standard generation) and confirm with the user before generating.
"Same song again but as a reggaeton version"	Route: `minimax_cover` with the existing song as source. Use the same project/song subfolder, suffix the MP3 (`M1_original.mp3` + `M2_reggaeton.mp3`).

This table is the abstract of references/user-preference-flow.md (which lives in the base skill). If you want a more detailed case, defer to the base skill's table and combine with this skill's route mapping.

Output File Layout (Per-Song Subfolders)

MiniMax-specific additions (drop these into the per-song subfolder alongside the base items):

File	Source	Notes
`<song-slug>_analysis.json`	`analysis_orchestrator.py --output`	MiniMax-specific analysis results (emotion, BPM, key, segments)
`<song-slug>_lyrics.txt`	`mmx music generate --lyrics-file`	Optional if user provided lyrics inline
`<song-slug>_<style>_prompt.txt`	The exact text passed to `--prompt`	For reproducibility

The LLM should aim for the base skill's layout by default. The MiniMax-specific files are added on top when MiniMax features are used (cover workflow, mashup, analysis, etc.).

If the runtime needs a MEDIA: delivery path, use a path without spaces or copy the final file into a workspace media folder first. Keep the archival copy in the per-song output folder.

Quick Start with the Orchestrator

One entry point for all input analysis:

bash

python3 scripts/analysis_orchestrator.py --audio /tmp/song.wav

It routes local audio files and two-song pairs to the right extractors. URLs, video, and images are no longer supported as input to this skill. Full per-input commands, extraction guidance, and the per-song output layout: references/orchestrator-quickstart.md.

When NOT To Use

Do not use this skill when:

the user only needs standard song generation without cover, mashup, or analysis — use music-craft instead (lighter, no MiniMax dependency)
the runtime does not expose a music_generate tool and there is no MINIMAX_API_KEY configured — both skills need the runtime
the user wants deterministic, single-shot generation with no iteration — overkill
the user wants to mutate a specific existing audio file (pitch shift, time stretch, stem split) — that is post-production, not generation
the user is not on a MiniMax Token Plan — the advanced features (cover, mmx per-flag control, lyrics API, emotion-driven prompts) require the plan
the user needs a reliable full-length 3:00+ song with exact duration — prefer music-craft with ACE-Step instead; MiniMax is the right tool when speed, convenience, cover workflows, mashups, or mmx flag control matter more than exact output length

Decision Tree

Use the base skill unless one of these MiniMax-specific needs is present:

melody-preserving cover or style transfer from a local audio file
two-song mashup
lyrics API preview/edit flow
emotion analysis that feeds the prompt
exact mmx control for BPM, key, structure, or avoid lists

If the user wants a new song that only borrows a style, stay in music-craft unless they also need exact flag control or lyrics API iteration.

If the user provides a URL, do not attempt to download it. Tell them this skill accepts only local files, and suggest music-source-fetch if they need to fetch audio by title.

Audio Source (Local Only)

The source for cover, mashup, or style transfer must be a local audio file path. URL inputs trigger a soft warning and the agent asks for a local file. If you have a URL, fetch it locally first with the private music-source-fetch skill.

First Response Defaults

Use these defaults on the first pass:

Cover from a local audio file: start with the one-step cover path. Switch to two-step only if the user wants translated lyrics, edited ASR lyrics, or custom lyrics.
Style transfer only: do not use cover unless melody preservation matters. Use standard generation plus mmx flags if exact BPM/key/structure matter.
Two-song mashup: anchor on Song A. If Song A has audio, default to the cover two-step workflow; if Song B is only named, ask for a short style description or fetch more context if free tools are available.
Lyrics API generation or edit: use write_full_song for blank-page generation and edit for revisions.
Emotion-analysis-to-prompt: run analysis first, then convert to a prompt; only ask whether the output should be cover, mashup, or standard generation, plus the target language if missing.
Exact BPM/key/structure control: make mmx flags the source of truth and keep the prompt descriptive but non-conflicting.

Ambiguity Questions

Ask at most 1-3 questions. Separate blockers from quality tweaks:

Required blockers first: local source file, which song is A vs B, whether lyrics already exist, whether the output must preserve melody.
Optional quality after blockers: target language, target style, BPM, key, structure, instruments, vocal color, avoid list.

Use these exact patterns when clarification is needed:

Cover: "Which source should I use?" "Do you want the original lyrics, translated lyrics, or new lyrics?" "Any target style, or should I derive it from the source?"
Mashup: "Which song is A and which is B?" "Do you have audio for Song B, or only the name?" "Should the lyrics stay the same or be rewritten?"
Lyrics API: "Write from scratch or edit existing lyrics?" "What language should I target?" "Any hard structure requirements?"
Emotion prompt: "Do you want cover, mashup, or standard generation?" "What language should the output use?" "Should I prioritize tenderness, energy, or structure?"
mmx precision: "Which values are mandatory: BPM, key, structure, or avoid list?" "Any instruments or vocals that must stay in or stay out?"

Relationship to `music-craft`

This skill extends the base skill, it does not replace it. The shared concepts are:

Concept	Where it lives
Pre-Flight Check (platform detection)	This skill (extended required list)
Anti-sparse rules (canonical text)	Base skill, referenced from here
Prompt formula (production sheet)	Base skill, referenced from here
Structure tags (14 tags)	Base skill, referenced from here
User preference flow (auto-detect + ask)	Base skill, referenced from here
Output file layout (per-song subfolders, slug rules, version prefix)	Base skill, referenced from here; MiniMax adds analysis.json and lyrics.txt
Rate limits (generic)	Base skill
Quality verification checklist	Base skill, extended here for MiniMax
Operating rules (6-step loop)	Base skill, summarized here with MiniMax-specific extensions

The MiniMax-specific additions are:

MiniMax concept	Where it lives
`mmx` CLI quick reference	This skill
`mmx` full flag reference	This skill, `references/mmx-flags-reference.md`
Cover workflow (one-step, two-step)	This skill, `references/cover-workflow.md`
Lyrics generation API	This skill, `references/lyrics-generation.md`
Mashup workflow (A + B)	This skill, `references/mashup-workflow.md`
Emotion analysis (vocal speed, intensity, pitch)	This skill, `references/emotion-analysis.md`
MiniMax-specific error handling	This skill, `references/error-handling.md`
Audio analysis scripts	This skill, `scripts/`
Free tool inputs (web, memory; image removed in v1.5.0)	Both skills — base layer in `music-craft`, MiniMax layer here in `references/free-tool-inputs.md`

Pre-Flight Check

Run the extended pre-flight in references/setup-and-preflight.md before the first generation or analysis. Never install anything without explicit user consent; required: the music_generate tool, MINIMAX_API_KEY, python3, mmx — if one is missing, ask, do not degrade silently.

Free Tool Augmentation (Input Enrichment)

The OpenClaw runtime exposes several free tools (web_fetch, web_search, memory, browser) that enrich the music generation workflow. The base layer is documented in music-craft → Free Tool Augmentation and references/free-tool-inputs.md. This section shows how they compose with MiniMax-specific features.

v1.5.0+: The image tool flow was removed (no album art / OCR / face / VLM analysis in this skill). The skill is audio-only. Web tools (web_fetch, web_search) are still available for metadata-only enrichment (artist info, genre descriptions) but never for downloading audio.

Free-tool routing, blocker checks, prompt/flag lint, and MiniMax combos: references/free-tool-inputs.md.

Operating Rules

Same 6-step loop as music-craft, with MiniMax-specific extensions:

Read and auto-detect — same
Ask only the ambiguous parts — same, plus ask if the user wants cover / mashup / standard
Translate to a production-sheet prompt — same, but consider whether to use mmx flags (see references/mmx-flags-reference.md) instead of packing everything into the prompt
Structure the lyrics — same, plus consider lyrics API for generation or edit (see references/lyrics-generation.md)
Generate and verify — same, plus the music-cover model for melody preservation
Iterate — same, plus emotion analysis to inform the next prompt adjustment

For the full 6-step detail, see music-craft → Operating Rules.

Song length (`--length` is a hint, not a guarantee)

mmx music generate --length accepts milliseconds as a duration hint. It is useful, but it is not precise. Don't expect mmx to hit 3:30 exactly. In the 2026-06-12 field run, cloud outputs ranged from 57-135% of requested duration. If you need precise length, ACE-Step is the right tool (it has audio_duration). If you want MiniMax's speed and the song length is flexible, mmx is fine.

Details: references/mmx-flags-reference.md.

mmx CLI Quick Reference

The shape of a complete generation command:

bash

mmx music generate \
  --prompt "<production-sheet prompt>" \
  --lyrics-file lyrics.txt \
  --model music-2.6 \
  --bpm 96 --key "D major" --structure "intro-verse-chorus-verse-chorus-bridge-chorus-outro" \
  --vocals "<vocal description>" --genre "<genre>" --mood "<mood>" --instruments "<instruments>" \
  --avoid "<what to avoid>" --out output.mp3

Full reference with all flags, examples, and model selection guidance: references/mmx-flags-reference.md.

mmx Music Generation — verified patterns (June 2026)

Pattern A: full song with detailed prompt + 6 metadata flags (--vocals, --genre, --mood, --instruments, --bpm, --key) — production-grade output. Pattern B: crazy combo experiments (e.g. opera vocals over heavy metal), uses scripts/generate_with_retry.py wrapper. Model selection guidance covers music-2.6 vs music-2.6-free and when to use each; --instrumental and --lyrics-optimizer flags bypass the --lyrics requirement for BGM and auto-lyrics workflows. Prompt length safety: prompts >2000 UTF-8 bytes fail with invalid params, prompt length not valid — run scripts/lint_music_request.py before generation. Lyrics tag safety: run scripts/lint_lyrics.py or pass --lyrics-file to lint_music_request.py; non-whitelisted bracket tags may be sung. URL expiration: when using --output-format url, the returned URL has a 24h time limit — always use --out or download promptly.

For cloud reliability, keep standard music generate prompts under about 500 characters when possible. Inline reference artists in the prompt instead of relying on --references for batch runs.

Details: references/mmx-flags-reference.md.

Cover Workflow

Cover workflow preserves the original song's melody while applying a different style. Two paths exist: one-step (scripts/generate_with_retry.py -- music cover ..., MiniMax extracts lyrics via ASR and applies the new style) and two-step (preprocess to get a cover_feature_id, edit ASR lyrics, then generate — better when lyrics need correction or the user wants different lyrics in the new style).

Full workflow with payloads, error handling, and use cases: references/cover-workflow.md.

Lyrics Generation

MiniMax has a dedicated lyrics_generation endpoint that produces structured lyrics (with [Verse], [Chorus], etc. tags) from a theme prompt. Two modes:

write_full_song — create new lyrics from a theme
edit — modify existing lyrics (e.g., make the chorus stronger, shift to a hopeful ending)

The output is structured lyrics that can be passed directly to music_generate or mmx music generate.

Full detail with API examples, parameters, and use cases: references/lyrics-generation.md.

Lyrics Source

Whisper transcription of the local audio file is the only lyrics source in this skill. The LRCLib web lookup moved to the private music-source-fetch skill in v1.5.0.

Mashup Workflow

The signature MiniMax-specific feature: combine Song A (content + emotion) with Song B (style).

Workflow:

Get Song A (local audio file or song name)
Get Song B (local audio file or song name)
Run emotion analysis on Song A (if audio available) to extract the emotional arc
Build a prompt that applies Song B's style to Song A's content and emotion
Generate using the cover workflow (preserves melody) or standard generation (creative reimagining)

This is the most powerful feature in this skill. The output preserves what makes Song A recognizable (lyrics, melody, emotion) while applying Song B's production style.

Full detail with the emotion-to-prompt conversion and the two-song analysis script: references/mashup-workflow.md and references/emotion-analysis.md.

Emotion Analysis

Emotion analysis extracts per-section features from input audio (intensity, pitch, vocal effort, breathiness, spectral centroid, emotion classification, repetitive intensification, emotional shifts, vocal speed, pitch bends). The analysis outputs JSON that the emotion_to_prompt.py script converts into a ready-to-use production-sheet prompt. Run analysis first when audio is available; use the local-only path (assemble prompt from JSON without the cloud helper) when MiniMax API access is unavailable.

Full detail with detection cookbook, pipeline, scripts, local-only path, and the 25+ emotion set: references/emotion-analysis.md. For emotion recipes in the OUTPUT, see references/emotion-delivery.md.

Analysis Quality (Summary Format, Confidence, Fallbacks)

Analysis scripts in scripts/ produce different views (emotion, beats, melody, structure, instrumentation). The skill expects them to converge on a single compact summary so downstream code and humans can read the same shape regardless of which scripts ran.

Compact Analysis Summary

Every analysis result should include a summary object with these keys:

Key	Type	Meaning
`tempo`	string	BPM value with confidence, e.g. `120 BPM (confidence 0.92)`
`key`	string	Detected key, e.g. `E minor (confidence 0.71)`
`sections`	list	Section labels with timing, e.g. `[{"label": "verse", "start": 0.0, "end": 28.5}, ...]`
`instrumentation`	list	Detected instrument palette, e.g. `["electric guitar", "drums", "bass"]`
`vocal_traits`	dict	Breathiness, intensity, pitch range, e.g. `{"breathiness": "high", "intensity": "medium"}`
`energy_curve`	list	Per-section energy values, e.g. `[{"t": 0, "energy": 0.6}, ...]`
`hook_points`	list	Timestamps of detected hooks, e.g. `[12.4, 48.0]`
`mix_notes`	list	Short strings, e.g. `["vocal upfront", "wide stereo drums", "rolled-off highs"]`

Scripts may add their own fields, but every script must return at least the keys above (use empty list / unknown string when a key has no data).

Confidence levels and fallback behavior for missing optional dependencies: references/advanced-audio-analysis.md.

For arranger-style triage, extract stems with scripts/extract_stems.py, then run scripts/per_stem_analysis.py <stems.json>. Use full-mix Whisper for lyrics; stems are for timbre, pitch, masking, and mix decisions. If the user asks to change one instrument, scripts/hybrid_remix.py is experimental and requires a validated transformed stem unless --allow-stem-cover is explicitly used for a smoke test.

Rate Limits (MiniMax-specific)

Hard limits: 120 RPM, 20 concurrent connections, output URLs expire in 24 hours, cover feature IDs expire in 24 hours. Under Token Plan 3.0 (June 2026+), the actual ceiling is credit-based: the documented 120 RPM is the API limit, but the Token Plan 3.0 quota is what determines your real ceiling.

Full detail, Token Plan 3.0 credit-pool mechanics, 429 recovery steps, and the usage-check command: references/error-handling.md.

Anti-Sparse (MiniMax-Specific Deep Dive)

The base anti-sparse rules live in ../music-craft/SKILL.md. MiniMax adds a more severe failure mode: MiniMax interprets "sparse" or "minimal" as "remove all instruments", even more aggressively than other providers — never use those words in a prompt without pairing them with an explicit instrument list.

Full deep-dive with observed failure modes, mitigation steps, and the canonical phrase blocklist: references/error-handling.md.

Quality Verification Checklist

Same 8-point checklist as the base skill, plus 4 MiniMax-specific items:

Cover preserves melody recognisably. If the user said "make it sound like Song X", the new version should be recognisable as Song X's melody with Song Y's style.
Emotion curve matches Song A (for mashups). The dynamic arc of the output should follow the original's intensity, not flatten to a single energy.
--avoid flags are respected. If the user said "no electronic sounds", the output should not have synths.
Per-flag control worked (BPM, key, structure). If the user asked for 80 BPM in E minor, the output should be in that range, not "close enough".

Output Verification (Covers, Mashups, Style Transfer)

After generation, run a post-generation check that is specific to the route. Every cover, mashup, and style-transfer output is verified against its route checklist before delivery.

Route checklists (cover / mashup / style transfer / emotion prompt), failure-signature table, and revision prompt templates: references/error-handling.md.

Lyrics Optimizer Behavior

Same as the base skill — when music_generate is called without explicit lyrics, MiniMax auto-generates. With this skill, you can also call the lyrics_generation API directly to preview the lyrics before generation, or to iterate via the edit mode.

If the user wants specific words, the lyrics_generation API's edit mode lets you modify auto-generated lyrics to match the user's intent without regenerating the whole song.

Reference Map

references/setup-and-preflight.md — extended pre-flight: platform notes, required/optional dependencies, ask-the-user pattern, local analysis memory
references/mmx-flags-reference.md — full mmx CLI flag reference with worked examples
references/examples.md — practical MiniMax examples with routing, first questions, workflow shapes, and prompt/flag lint catches
references/cover-workflow.md — one-step and two-step cover workflow with payloads, error handling, use cases
references/lyrics-generation.md — the lyrics_generation API endpoint, both modes, examples
references/mashup-workflow.md — two-song mashup workflow, emotion-to-prompt conversion, decision tree
references/emotion-analysis.md — 25+ emotion classifications + per-emotion detection cookbook + emotion combinations + the analysis pipeline
references/emotion-delivery.md — 21 emotion recipes for the OUTPUT + iteration loop + common mistakes
references/orchestrator-quickstart.md — per-input orchestrator commands (audio, two-song pairs), extraction guidance, per-song output layout
references/minimax-generation-caveats.md — sequential-run rules, output-file verification, duration-is-a-target caveats, and delivery copy templates
references/short-prompt-recipes.md — short prompt recipes for reliable cloud iterations under about 500 characters
references/advanced-audio-analysis.md — advanced free tools (Essentia, Demucs, Basic Pitch, Music21, CREPE) for deeper analysis when basic librosa/parselmouth is not enough
references/error-handling.md — MiniMax-specific error table, recovery patterns, anti-sparse failure recovery
references/free-tool-inputs.md — MiniMax layer: free-tool routing, blocker checks, and prompt/flag conflict lint before analysis
scripts/check_environment.py — lightweight preflight diagnostic for Python, env vars, CLI tools, and optional packages
scripts/lint_music_request.py — standard-library helper for routing, blocker, missing-field, prompt, lyrics-tag, duration-density, and mmx flag conflict checks
scripts/lint_lyrics.py — standard-library lyrics preflight for section-tag whitelist checks and syllable/BPM duration estimates
scripts/verify_lyrics_alignment.py — standard-library post-generation transcript-vs-lyrics overlap check for semantic delivery
scripts/verify_cloud_output.sh — shell helper for file existence, size, MP3 probe, and expected-duration range checks after cloud generation
scripts/batch_cover.py — sequential batch wrapper for repeated music cover prompts via generate_with_retry.py
scripts/per_stem_analysis.py — lightweight arranger triage from extract_stems.py stems.json
scripts/hybrid_remix.py — experimental preview remix planner; stem-to-cover is gated behind explicit smoke-test consent
scripts/smoke_test.py — standard-library smoke tests for pure helper behavior
scripts/ — Python helpers for audio analysis (download, segment, analyze, convert emotion to prompt)
music-source-fetch — private skill (NOT published) that holds the YouTube/JioSaavn/mx3.ch/LRCLib download code Luis uses to fetch source audio + lyrics by title
music-craft — base skill with shared concepts (Pre-Flight, anti-sparse, prompt formula, structure tags, Request Intake, User Preference Flow)
music-craft → references/free-tool-inputs.md — base layer for free tool inputs (web_fetch, web_search, image, memory)
references/changelog.md — release history (v1.4.1, v1.4.0, v1.3.0, v1.1.0, v1.0.0, v0.3.0); operating guidance lives in the topic references

Music Craft — MiniMax

Install

Music Craft — MiniMax

Data, Consent, and Local Side Effects

Required MiniMax Workflow

Reliability Caveats

Routing and Blocker Checks

When To Use

Request Intake (adapted for MiniMax features)

Fields checklist (MiniMax-specific)

User Preference Flow (message patterns → action)

Output File Layout (Per-Song Subfolders)

Quick Start with the Orchestrator

When NOT To Use

Decision Tree

Audio Source (Local Only)

First Response Defaults

Ambiguity Questions

Relationship to music-craft

Pre-Flight Check

Free Tool Augmentation (Input Enrichment)

Operating Rules

Song length (--length is a hint, not a guarantee)

mmx CLI Quick Reference

mmx Music Generation — verified patterns (June 2026)

Cover Workflow

Lyrics Generation

Lyrics Source

Mashup Workflow

Emotion Analysis

Analysis Quality (Summary Format, Confidence, Fallbacks)

Compact Analysis Summary

Rate Limits (MiniMax-specific)

Anti-Sparse (MiniMax-Specific Deep Dive)

Quality Verification Checklist

Output Verification (Covers, Mashups, Style Transfer)

Lyrics Optimizer Behavior

Reference Map

Related skills

Relationship to `music-craft`

Song length (`--length` is a hint, not a guarantee)