# Audio-Video Skill — Feature Overview A complete reference of every capability in this skill, organized by section. --- ## SECTION A — Format Conversion & Transcoding Convert and transcode between any video format or codec. Handles the full range from modern codecs (AV1, HEVC) to universal compatibility (H.264), with hardware acceleration on all major platforms. **Capabilities:** - Transcode to H.264, H.265/HEVC, AV1, VP9/WebM - Hardware-accelerated encoding: VideoToolbox (macOS), NVENC (NVIDIA), QSV (Intel) - Container remux (MP4 ↔ MKV ↔ MOV) without re-encoding via stream copy - Convert PNG/JPEG image sequences into video - Extract video frames as image sequences (all frames, at specific fps, keyframes only) **Key tools:** `libx264`, `libx265`, `libaom-av1`, `libvpx-vp9`, `h264_videotoolbox`, `h264_nvenc`, `-c copy`, `-crf`, `-preset`, `-movflags +faststart` **Example Use Cases:** 1. **Uploading to a platform that only accepts MP4/H.264** — Transcode any source (MKV, MOV, AVI, HEVC) to a universally compatible H.264 MP4 with one command. 2. **Archiving at smaller file sizes** — Re-encode old H.264 files to H.265 or AV1 to cut file size by 40–60% at the same visual quality. Useful for long-term storage where disk space matters. 3. **Changing containers without re-encoding** — Remux MKV to MP4 (or vice versa) in seconds via stream copy. No quality loss, no waiting — just restructures the container around the existing streams. 4. **Creating a video from a photo sequence** — Convert a timelapse or animation render (PNG sequence) into a video file, controlling the output fps independently of the input frame count. 5. **Extracting frames for machine learning or review** — Pull every frame, every Nth frame, or only keyframes from a video as individual images for dataset building or manual review. --- ## SECTION B — Audio Processing Extract, convert, filter, and analyze audio from any media file. Covers everything from simple format conversion to EBU R128 broadcast-standard loudness normalization. **Capabilities:** - Extract audio tracks as AAC, MP3, FLAC, WAV, or Opus - Convert between audio formats; downmix multi-channel to stereo; change sample rate - EBU R128 loudness normalization (two-pass, broadcast standard) - Audio filters: volume adjustment, high-pass, low-pass, noise reduction, dynamic range compression, fade in/out, stereo↔mono - Advanced: remove silence, change speed with pitch preservation (0.5–2.0×), pitch shift without tempo change, generate waveform stats, generate spectrogram PNG **Key tools:** `libmp3lame`, `loudnorm`, `highpass`, `lowpass`, `anlmdn`, `acompressor`, `afade`, `silenceremove`, `atempo`, `showspectrumpic` **Example Use Cases:** 1. **Extracting audio from a video for a podcast or music release** — Pull the audio track from any video as MP3, AAC, FLAC, or WAV without touching the video stream. 2. **Normalizing loudness before publishing a podcast** — Apply EBU R128 two-pass normalization so every episode hits the same loudness target (-16 LUFS for Apple Podcasts, -14 LUFS for Spotify) regardless of how it was recorded. 3. **Cleaning up a noisy recording** — Chain high-pass (remove low rumble/hum), `anlmdn` noise reduction, and dynamic range compression to make a poor-quality microphone recording more listenable. 4. **Speeding up an audiobook or lecture** — Use `atempo` to play back at 1.5× or 2× speed with pitch preserved, producing a natural-sounding faster version rather than a chipmunk effect. 5. **Generating a spectrogram for audio QA** — Visualize the frequency content of an audio file as a PNG to spot noise floors, clipping, or encoding artifacts before delivery. --- ## SECTION C — Video Editing The full editing toolkit: cut, join, resize, rotate, crop, overlay, and color grade — all without leaving FFmpeg. **Capabilities:** - **Trim & cut:** Fast keyframe-accurate trim (stream copy) or frame-accurate trim (re-encode); remove specific time ranges - **Concatenation:** Join same-codec files without re-encoding; join different codecs/resolutions with re-encode - **Scaling:** Resize to exact dimensions or platform targets (4K, 1080p, 720p, 480p); preserve aspect ratio with padding - **Frame rate:** Change fps; smooth slow-motion via frame interpolation (`minterpolate`) - **Rotation & flipping:** 90°/180° rotation, horizontal/vertical flip, auto-rotate from phone metadata - **Cropping:** Crop to coordinates, square center crop, auto-detect and remove black bars (`cropdetect`) - **Overlays & watermarks:** Image watermark, text watermark with custom font/color/alpha, timed animated overlays, picture-in-picture (PiP) - **Color grading:** Brightness, contrast, saturation, gamma; LUT (`.cube`) color grading; curves (S-curve); hue/saturation shift **Key tools:** `fps`, `minterpolate`, `transpose`, `vflip`, `hflip`, `crop`, `cropdetect`, `overlay`, `drawtext`, `eq`, `lut3d`, `curves`, `hue`, `scale`, `filter_complex` **Example Use Cases:** 1. **Trimming a highlight clip from a long recording** — Cut a specific segment from a 2-hour recording in seconds using stream copy (no re-encode, instant output). Use frame-accurate trim when you need to cut on an exact frame rather than the nearest keyframe. 2. **Joining multiple recordings into one file** — Concatenate episodes, segments, or daily recordings into a single file. Same-codec files join without re-encoding; mixed sources are handled with automatic re-encode. 3. **Resizing for a specific platform** — Scale a 4K master down to 1080p, 720p, or any platform target while automatically preserving the aspect ratio and padding with black bars if needed. 4. **Fixing a video shot in portrait mode on a phone** — Auto-rotate using the metadata flag (`-metadata:s:v rotate=0`) or transpose filter to correct sideways or upside-down footage. 5. **Adding a branded watermark to every video** — Overlay a logo PNG at a fixed position with configurable opacity. Apply to a single file or use with batch processing (Section I) to brand an entire library. 6. **Color grading with a LUT** — Apply a `.cube` LUT from any color grading tool (DaVinci Resolve, Lightroom exports, free LUT packs) to give footage a consistent look without a GUI editor. --- ## SECTION D — Subtitles & Captions Add, remove, convert, and embed subtitle tracks in any format. **Capabilities:** - Burn subtitles permanently into video (hard subtitles, not removable) - Add soft subtitle tracks that viewers can toggle on/off - Extract subtitle tracks from MKV, MP4, and other containers - Convert between SRT and ASS/SSA formats - Tag subtitle tracks with language metadata **Key tools:** `subtitles` filter, `mov_text`, `-c:s`, `-metadata:s:s:0 language=` **Example Use Cases:** 1. **Burning subtitles for social media** — Hard-burn SRT or ASS subtitles into the video so they always show regardless of player or platform — essential for silent autoplay on Instagram, TikTok, and LinkedIn. 2. **Adding soft subtitles for a streaming platform** — Embed toggleable subtitle tracks in MKV or MP4 without burning them in, so viewers can turn them on/off. Add multiple language tracks to the same file. 3. **Extracting subtitles from a downloaded MKV** — Pull the subtitle track out as a standalone `.srt` or `.ass` file for editing or translation. 4. **Converting subtitle formats** — Convert SRT to ASS for advanced styling (custom fonts, colors, positioning) or back to SRT for platforms that only accept plain subtitles. --- ## SECTION E — Thumbnails & Screenshots Extract frames and generate thumbnails for any use — preview images, video players, web galleries. **Capabilities:** - Extract a single frame at any timestamp - Generate high-quality PNG thumbnails - Auto-select the "best" frame (highest scene complexity) - Sprite sheets / contact sheets: multiple thumbnails tiled into a grid image - Extract one thumbnail every N seconds (for timeline preview strips) **Key tools:** `-ss`, `-vframes 1`, `thumbnail` filter, `scale`, `tile`, `fps` **Example Use Cases:** 1. **Generating a YouTube upload thumbnail** — Extract the best-looking frame from a video automatically, or pull from a specific timestamp, as a high-quality PNG ready for upload. 2. **Building a video preview strip** — Extract one frame every 10 seconds and tile them into a contact sheet. Used by video players and streaming platforms to show a timeline preview when hovering over the progress bar. 3. **Creating a preview image for a web gallery** — Auto-select the highest-quality frame (avoiding black fades and blurry motion) so the thumbnail actually represents the content. 4. **Extracting a still from a specific moment** — Pull a single frame at an exact timestamp for use in documentation, blog posts, or as a reference image. --- ## SECTION F — Streaming & Adaptive Bitrate Produce industry-standard streaming outputs — HLS, DASH, and live RTMP — from any source. **Capabilities:** - **HLS:** Generate segments + `.m3u8` playlist; multi-bitrate ABR ladder (1080p/720p/480p) with master playlist - **DASH:** Generate `.mpd` manifest with segmented output for MPEG-DASH delivery - **RTMP live streaming:** Stream to Twitch, YouTube, or any RTMP endpoint; screen capture to RTMP **Key tools:** `-hls_time`, `-hls_playlist_type`, `-hls_segment_filename`, `-var_stream_map`, `-master_pl_name`, `-f dash`, `-seg_duration`, `-f flv`, `-re`, `-g` **Example Use Cases:** 1. **Self-hosting video on your own server** — Generate HLS segments and a `.m3u8` playlist so any browser can stream your video natively via `