# Cover Workflow

Cover workflow preserves the original song's melody while applying a different style. This is the feature that lets you turn a rock track into a French chanson version, a reggaeton track into a ballad, or any other style transfer while keeping the original recognisable.

## When to Use Cover

- The user has reference audio (file or YouTube URL) and wants to keep the melody
- The user wants to change style, era, or genre of an existing song
- The user wants a "what if" reimagining (what if Bohemian Rhapsody was bossa nova?)

## When NOT to Use Cover

- The user wants to write a new song inspired by another (use standard generation with references)
- The user wants to combine two songs into one (use mashup workflow)
- The user only wants to change tempo or key (use standard generation with those parameters)
- The user does not have reference audio and just describes a style (use standard generation)

## Two Paths: One-Step vs Two-Step

### One-Step (Quick)

```bash
mmx music cover \
  --prompt "French chanson, accordion, strings, passionate French vocal, 80 BPM" \
  --audio-file /tmp/original.ogg \
  --out /tmp/cover.mp3
```

Or from URL:

```bash
mmx music cover \
  --prompt "French chanson, accordion, strings" \
  --audio "https://example.com/song.mp3" \
  --out /tmp/cover.mp3
```

**What happens:**

1. MiniMax downloads the audio (or uses the provided URL).
2. Extracts lyrics via ASR (automatic speech recognition).
3. Detects the structure (verse / chorus / bridge).
4. Applies the target style while preserving the melody.
5. Returns the cover.

**Limitations:**

- ASR may mis-hear lyrics, especially in noisy or non-English audio.
- The detected structure may not match the user's intent.
- No way to edit lyrics before generation.

### Two-Step (More Control)

**Step 1: Preprocess the audio**

The preprocess step returns a `cover_feature_id` (valid 24 hours) plus the auto-extracted `formatted_lyrics` and a `structure_result` with section timestamps.

```bash
curl --request POST \
  --url https://api.minimax.io/v1/music_cover_preprocess \
  --header "Authorization: Bearer $MINIMAX_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "music-cover",
    "audio_url": "https://example.com/original-song.mp3"
  }'
```

The response includes:

- `cover_feature_id` — valid 24 hours, use in step 2
- `formatted_lyrics` — editable, with structure tags
- `structure_result` — JSON with section timestamps

Note: do NOT use `mmx music cover` here. The mmx cover subcommand is the one-step path. For two-step, you must call `music_cover_preprocess` directly to get the `cover_feature_id` for step 2.

**Step 2: Generate cover with modified lyrics**

```bash
curl --request POST \
  --url https://api.minimax.io/v1/music_generation \
  --header "Authorization: Bearer $MINIMAX_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "music-cover",
    "cover_feature_id": "ID_FROM_STEP_1",
    "lyrics": "[Verse]\nModified French lyrics here\n\n[Chorus]\nMore lyrics",
    "prompt": "French chanson, accordion, strings, passionate vocal",
    "output_format": "url",
    "audio_setting": { "sample_rate": 44100, "bitrate": 256000, "format": "mp3" }
  }'
```

**What the two-step path gives you:**

- Edit the ASR'd lyrics before generation (fix errors, change wording, add structure tags)
- Use a different language (translate the original lyrics to the target language)
- Use no lyrics at all (just preserve the melody, instrumental cover)
- Use external lyrics (the user provided their own)

## Lyrics Strategies for Cover

### Same Lyrics, New Style

The user wants the same words, different style. Workflow:

1. Use one-step or two-step.
2. Pass the prompt for the new style.
3. Let MiniMax extract the lyrics (one-step) or use the user's lyrics (two-step with `--lyrics`).

### New Lyrics, Same Melody

The user wants different words in the same melody. Workflow:

1. Use two-step (you need to provide the new lyrics).
2. Preprocess to get the `cover_feature_id`.
3. Provide the new lyrics in the generation call.

Example: Take the melody of "Yesterday" by the Beatles and set new lyrics in Spanish about a modern-day break-up.

### Translated Lyrics

The user wants the original lyrics in a different language. Workflow:

1. Use two-step.
2. Translate the original lyrics yourself (or with the LLM).
3. Pass the translated lyrics in the generation call.

### No Lyrics (Instrumental Cover)

The user wants the melody as an instrumental. Workflow:

1. Use one-step with `--instrumental` flag, or
2. Use two-step with empty lyrics in the generation call.

## Style Transfer Without Melody Preservation

If the user wants the *style* of Song A applied to *new* lyrics (not preserving the melody), use standard generation with the style as a reference:

```bash
mmx music generate \
  --prompt "Style: similar to Bohemian Rhapsody (epic rock, multi-section, choir, dramatic dynamics), but with original lyrics in Spanish about leaving home" \
  --lyrics "..." \
  --model music-2.6 \
  --out /tmp/song.mp3
```

This is NOT a cover — the melody is new, the style is borrowed.

## Cover Workflow with YouTube

If the source is a YouTube URL:

1. Download with `yt-dlp`:
   ```bash
   yt-dlp -x --audio-format wav -o "/tmp/song_a.%(ext)s" "https://youtube.com/watch?v=..."
   ```
2. Convert to a format the cover API accepts (usually MP3 or WAV, under 50MB, 6 seconds to 6 minutes).
3. Trim if needed with `ffmpeg`:
   ```bash
   ffmpeg -i /tmp/song_a.wav -ss 0 -t 180 /tmp/song_trimmed.wav
   ```
4. Run the cover workflow with the trimmed file.

## Audio Input Limits

- **Minimum length:** 6 seconds
- **Maximum length:** 6 minutes
- **Maximum file size:** 50 MB
- **Supported formats:** mp3, wav, flac, ogg, m4a, and others

If the input is too long, trim it. If too large, convert to a lower bitrate.

## Cover Feature ID Lifecycle

`cover_feature_id` is valid for 24 hours from the preprocess call. After that, you need to re-preprocess.

If you need to regenerate the same cover multiple times (e.g., to iterate on the prompt), cache the `cover_feature_id` and reuse it.

## Quality Verification for Covers

After generating a cover, check:

1. **Melody recognisability** — Would a friend say "That's [Song X]"?
2. **Style application** — Would a friend say "but it sounds like [style Y]"?
3. **Lyrics alignment** — Are the lyrics recognisable (either the original or the new ones)?
4. **Structure preservation** — Does the new version follow the original's structure (verse-chorus-verse-chorus-bridge-chorus)?
5. **No a cappella or sparse drops** — Same anti-sparse rules as standard generation

If 3+ of these fail, adjust the prompt or try the two-step path for more control.

## Anti-Sparse for Covers

The anti-sparse rules apply even more strictly to covers, because the cover is changing style and the model can interpret "intimate ballad version" as "remove all instruments".

Always include in the cover prompt:

```
ALL instruments ALWAYS playing throughout, NEVER go a cappella or silent,
quiet sections: reduced to [specific instruments] only, still fully played
```

And in `--avoid`:

```
sparse, a cappella, minimal, silence, electronic sounds (unless desired)
```

## Errors and Recovery

| Error | Cause | Fix |
|---|---|---|
| `audio_url unreachable` | URL is dead, requires auth, or blocked | Download the file with `yt-dlp` or `curl` first |
| `audio too long` | Source > 6 minutes | Trim with `ffmpeg -ss <start> -t <duration>` |
| `audio too large` | Source > 50 MB | Convert to lower bitrate with `ffmpeg` |
| ASR extracted wrong lyrics | Noisy audio, accented vocals, non-English | Use two-step with manual lyrics |
| Output melody is unrecognisable | Style transfer too aggressive | Reduce the prompt's style intensity, or use a less dramatic target style |
| Output is sparse | Anti-sparse rules not applied | Add explicit instruments and "ALL instruments ALWAYS playing" |

## Worked Example: Rock to Chanson

User request: "Take this rock anthem I wrote and turn it into a French chanson."

Workflow:

1. Get the audio file (user upload or YouTube URL).
2. Preprocess:
   ```bash
   mmx music cover \
     --prompt "French chanson, accordion, upright bass, orchestral strings, piano, light percussion, 80 BPM in E minor, passionate French male vocal, melancholic romantic dramatic" \
     --audio-file /tmp/rock_track.mp3 \
     --lyrics "[Verse]\nJ'ai trouvé ta lettre\nMais je n'ai pas vraiment vérifié\nCe mot unique que tu as écrit\nL'histoire est déjà terminée\n\n[Pre Chorus]\nNon, je ne veux pas connaître les raisons\nCar l'amour n'a pas besoin de raison pour exister\n\n[Chorus]\nJe sais que je, je sais que je, je ne peux pas continuer avec toooooou" \
     --out /tmp/chanson_cover.mp3
   ```
3. Verify melody is recognisable.
4. If sparse, retry with explicit anti-sparse text.
5. Deliver the cover.