light-game-bgm

Creative
Musicgamebgm

Compose light, melodic, loopable background music in the spirit of classic cozy game town, village, and overworld themes (the calm 2D RPG / retro online-game vibe) and render it to a real playable audio file using sampled instruments. Use this whenever the user wants to create original music, a game soundtrack, BGM, a chiptune or town/overworld theme, a calm instrumental loop, or asks you to "make a song", "write some music", or produce a seamless looping audio track — even if they don't say the word "skill". Also use it when an existing track sounds too synthetic or stiff (the "stock General-MIDI" sound) and the user wants more realistic instruments, expressive strings, concert-hall space, or a clean loop. Covers the full pipeline: waveform synthesis OR (preferred) MIDI + soundfont rendering, expressive performance, convolution reverb, and seamless looping.

Install

openclaw skills install @yzwu2017/light-game-bgm

Light game BGM composer

Produce original, nostalgic, loopable instrumental music and render it to an actual audio file the user can play. This skill captures a pipeline that was tuned on a real cozy 2D-game town theme, including the mistakes that taught each lesson.

The core insight

Two things, in order, decide whether the music sounds real:

  1. Use sampled instruments, not hand-built oscillators. From-scratch synthesis can only ever sound like a synth. Write the piece as MIDI and render it through FluidSynth + a soundfont so every note plays a recorded instrument sample.
  2. Perform the MIDI; don't just place notes. A correct-but-static MIDI played through a great soundfont still sounds like the stiff, stock General-MIDI cliché. The realism comes from how the notes are played — dynamics, timing, articulation. This is the part people underestimate.

There is a hard ceiling: truly recorded-section realism needs dedicated VST libraries in a DAW, which the CLI can't load. Be upfront about this and offer the MIDI export as the bridge. See references/soundfonts.md (realism ladder).

Environment check

python3 -c "import numpy, scipy" && echo ok      # synthesis + reverb math
python3 -c "import mido" || pip install mido       # MIDI authoring
which fluidsynth || brew install fluid-synth       # soundfont renderer
which ffmpeg                                        # wav -> mp3

Then fetch a soundfont (see references/soundfonts.md). Default to GeneralUser GS for light/cozy pieces — it's pre-balanced and blends; the bigger FluidR3_GM is brighter and more forward (good for leads, can over-expose strings). Bigger is not better.

Workflow

1. Decide the musical brief

Key (major = bright/cozy), tempo (~100–120 BPM for town themes), instrument roster, and form. A satisfying loop form is A → A′ (variation) → B (contrast) → A″ (return), ~32 bars ≈ 70 s at 110 BPM.

2. Write the composition as a per-song Python script

Import scripts/midi_helpers.py and build one NoteBuilder per voice. Keep the note data (melody, chords, counter-lines) in the song script — it changes every time — and lean on the helpers for the reusable expression machinery. See the template below.

3. Apply the expression layer (this is what kills the fake feeling)

  • Humanize timing — nudge note starts a few ticks (hum=). Perfectly quantized = robotic.
  • Velocity variation (vvar=) — no two notes identical.
  • CC11 swells (.swell(...)) — phrases breathe louder/softer. The single biggest fix for stiff string pads.
  • Legato / held common tones (held_runs(...)) — sustain a chord tone across bar lines instead of re-attacking it every bar. Block chords that re-hit every downbeat are the stock-library giveaway.
  • Slow-attack patch for pads (Slow Strings, prog 49) — mimics bowing in.
  • A little vibrato (vib=, CC1) on strings.
  • Pan voices (CC10) — e.g. cello left, violin right, so a two-part string dialogue (call-and-response) reads spatially. Giving strings their own conversing lines beats parking them on block-chord pads.

4. Render dry

fluidsynth -ni -R 0 -C 0 -g 0.8 -r 44100 -F song_dry.wav GeneralUser.sf2 song.mid

Reverb/chorus OFF — the next step owns the space.

5. Concert-hall reverb + seamless loop

python3 scripts/hall_reverb_loop.py --input song_dry.wav --output song_loop.wav \
    --bpm 110 --beats 128 --decay 2.1 --x2

--beats = bars × beats/bar (the loop body length). The script convolves a synthesised hall impulse response and, for a loop, wraps the post-loop tail (note releases + reverb) back onto the start so it repeats with no click and no fade. --decay sets room size (1.2 room · 2.1 hall · 3.5 cathedral). --x2 writes a two-loop file to audition the seam.

6. Verify, then encode

python3 scripts/verify_loop.py --input song_loop_x2.wav --bpm 110 --beats 128
ffmpeg -y -i song_loop.wav -af "loudnorm=I=-15:TP=-1.5" -b:a 192k song_loop.mp3

Don't ship a loop you haven't verified — verify_loop.py confirms the seam jump is inaudible (jump/rms well under 0.06) and nothing clips.

7. Deliver

Give the user the mp3 (plays everywhere), and offer the .mid (for a DAW) and the per-song .py (to tweak the composition). Present concrete next-step options (longer bridge, wider panning, bigger/smaller hall, different lead instrument) rather than asking open-ended questions.

Per-song script template

import sys; sys.path.insert(0, 'scripts')
from midi_helpers import NoteBuilder, new_midi, held_runs, realize, midi

BPM, TPB = 110, 480
mid = new_midi(BPM, TPB)

# --- melody (music box, centered) ---
lead = NoteBuilder(0, 10, TPB, pan=64, rev=55, vol=104, seed=1)
lead.line([(0,'A4',1),(1,'F#4',1),(2,'D4',1.5),(3.5,'E4',.5)], vel=90, gate=0.9, hum=8)
# ... more phrases ...

# --- strings as a dialogue: cello left, violin right ---
cello  = NoteBuilder(4, 42, TPB, pan=36, vib=30, rev=68, vol=92, seed=2)
violin = NoteBuilder(1, 40, TPB, pan=92, vib=38, rev=70, vol=84, seed=3)
cello.line([(64,'G3',1),(65,'A3',1),(66,'B3',2)], vel=66, gate=0.97)   # call
violin.line([(72,'B4',1),(73,'C#5',1),(74,'D5',2)], vel=62, gate=0.96) # answer
cello.swell(128, base=76, amp=28); violin.swell(128, base=72, amp=30)

# --- soft pad via held common tones (legato, no re-attacks) ---
voicing = [['D3','F#3','A3'], ['C#3','E3','A3'], ...]   # one per bar
pad = NoteBuilder(5, 49, TPB, pan=64, vib=18, rev=90, vol=50, seed=4)
for start, pitch, dur in held_runs(voicing):
    pad.note(start, pitch, dur, vel=30, gate=0.99, hum=20)
pad.swell(128, base=55, amp=22)

for nb in (lead, violin, cello, pad):
    mid.tracks.append(nb.track())
mid.save('song.mid')

(A complete, working realisation of this exact arrangement — a full cello/violin-dialogue town theme with the four-section loop form — was built in the parent project; look for the per-song MIDI builder and post-processing script there as a reference implementation if present.)

Bundled resources

  • scripts/midi_helpers.pyNoteBuilder (humanize, velocity, CC11 swells, panning), held_runs (legato pads), realize (arps), midi/new_midi.
  • scripts/hall_reverb_loop.py — convolution hall reverb + seamless-loop wrap.
  • scripts/verify_loop.py — objective seam/clipping check.
  • references/soundfonts.md — realism ladder, GeneralUser GS vs FluidR3, download + validation, GM program numbers, CC reference.

Fallback: pure synthesis (no soundfont)

If FluidSynth/soundfonts are unavailable, you can still synthesise from oscillators with numpy (sine/triangle + ADSR + simple reverb) and write a WAV directly. Accept that it will sound chiptune/synthetic — set that expectation with the user rather than presenting it as realistic.