Meta Video Ad Analyzer

Creative

Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

Install

openclaw skills install @fortytwode/meta-video-ad-analyzer

Video Ad Analyzer

AI-powered video content extraction using Google Gemini Vision.

What This Skill Does

Frame Extraction: Smart sampling with scene change detection
OCR Text Detection: Extract text overlays using EasyOCR
Audio Transcription: Convert speech to text with Google Cloud Speech
AI Scene Analysis: Describe each scene using Gemini Vision
Native Video Analysis: Direct video understanding for longer content
Thumbnail Generation: Auto-generate thumbnails from first frame

Setup

1. Environment Variables

# Required for Gemini Vision
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Required for audio transcription
# (same service account needs Speech-to-Text API enabled)

2. Dependencies

pip install opencv-python pillow easyocr ffmpeg-python google-cloud-speech vertexai google-api-python-client

Also requires ffmpeg and ffprobe installed on system.

Usage

Basic Video Analysis

from scripts.video_extractor import VideoExtractor
from scripts.models import ExtractedVideoContent
import vertexai
from vertexai.generative_models import GenerativeModel

# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
gemini_model = GenerativeModel("gemini-1.5-flash")

# Create extractor
extractor = VideoExtractor(gemini_model=gemini_model)

# Analyze video
result = extractor.extract_content("/path/to/video.mp4")

print(f"Duration: {result.duration}s")
print(f"Scenes: {len(result.scene_timeline)}")
print(f"Text overlays: {len(result.text_timeline)}")
print(f"Transcript: {result.transcript[:200]}...")

Extract Only Frames

frames, timestamps, text_timeline, scene_timeline, thumbnail = extractor.extract_smart_frames(
    "/path/to/video.mp4",
    scene_interval=2,    # Check for scene changes every 2s
    text_interval=0.5    # Check for text every 0.5s
)

Analyze Images

# Works with images too
result = extractor.extract_content("/path/to/image.jpg")
print(result.scene_timeline[0]['description'])

Output Structure

ExtractedVideoContent(
    video_path="/path/to/video.mp4",
    duration=30.5,
    transcript="Here's what we found...",
    text_timeline=[
        {"at": 0.0, "text": ["Download Now"]},
        {"at": 5.5, "text": ["50% Off Today"]}
    ],
    scene_timeline=[
        {"timestamp": 0.0, "description": "Woman using phone app..."},
        {"timestamp": 2.0, "description": "Product showcase with features..."}
    ],
    thumbnail_url="/static/thumbnails/video_thumb.jpg",
    extraction_complete=True
)

Key Features

Feature	Description
Scene Detection	Histogram-based change detection (threshold=65)
OCR Confidence	Tiered thresholds (0.5 high, 0.3 low)
AI Proofreading	Gemini cleans up OCR errors
Source Reconciliation	Merges OCR + Vision text intelligently
Native Video	Direct Gemini analysis for <20MB files

Prompts

Customize AI behavior by editing prompts in the prompts/ folder:

scene_analysis.md - Frame analysis prompts
scene_reconciliation.md - Scene enrichment prompts

Common Questions This Answers

"What text appears in this video ad?"
"Describe each scene in this creative"
"What does the narrator say?"
"Extract the call-to-action from this ad"