{"skill":{"slug":"tom-video-understanding","displayName":"Tom Video Understanding","summary":"Local video comprehension skill. Use ffmpeg to extract audio and frames, FunASR for speech recognition, and qwen3-vl for image understanding.","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":166,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1775923021190,"updatedAt":1775923907187},"latestVersion":{"version":"1.0.0","createdAt":1775923021190,"changelog":"Initial release of the video-understanding skill.\n\n- Enables local video content comprehension using ffmpeg, FunASR, and qwen3-vl.\n- Extracts audio and key frames from videos via ffmpeg commands.\n- Performs local Chinese speech recognition with FunASR.\n- Provides detailed image understanding for video frames using qwen3-vl through Ollama.\n- Outlines a step-by-step workflow and key prerequisites for setup and usage.","license":"MIT-0"},"metadata":null,"owner":{"handle":"tomuiv","userId":"s171935xz6xsqmpn3z12w1jnk184m1a9","displayName":"TOMUIV","image":"https://avatars.githubusercontent.com/u/232025981?v=4"},"moderation":null}