Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

YouTube Model Feeder

Food for your model — extract transcripts, key frames, OCR, slides, and LLM summaries from YouTube videos into structured AI-ready knowledge.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 58 · 0 current installs · 0 all-time installs
byMaxime Roy (new.blacc)@celstnblacc
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The declared functionality (download video, extract frames, OCR, transcript, slide detection, and LLM summaries) aligns with the tools and binaries mentioned (yt-dlp, ffmpeg, Tesseract, Whisper/Ollama, LLM providers). Required bins (docker, ffmpeg) are reasonable for this task.
Instruction Scope
SKILL.md gives explicit runtime instructions (git clone, docker-compose up, use local FastAPI, submit jobs via API). It does not instruct reading unrelated host files or exfiltrating secrets, but it does expect the user/agent to run and interact with a local service that will process videos and persist data. The document references storing provider API keys (OpenAI/Anthropic) via the service API — this implies the service will accept and store secrets.
!
Install Mechanism
No formal install spec is bundled; instead the guide instructs cloning a GitHub repo and running docker-compose. That will download and execute code from a remote repository and start multiple persistent containers (api, db, redis, etc.). Running docker-compose on an unreviewed repo is a moderate-to-high risk action because it executes remote code and may pull arbitrary images or mount host resources.
Credentials
The skill itself declares no required env vars, which is consistent because it supports a local Ollama default. It documents optional use of OPENAI_API_KEY and ANTHROPIC_API_KEY for summarization providers — these are reasonable for the stated features. However, the service claims to encrypt API keys at rest (Fernet) without explaining where the encryption key is stored or protected, which merits review before supplying credentials.
!
Persistence & Privilege
The instructions create persistent services (Postgres, Redis, API server) and expose ports (8000, 3000). Although the skill package itself does not force persistent inclusion in the agent, following the install steps results in long-running components that may store processed data and user API keys. Users should verify network exposure and storage locations in the docker-compose and repo before installing.
What to consider before installing
This skill appears to do what it says (extracts transcripts, frames, OCR, and summaries), but it relies on cloning and running a GitHub repo via docker-compose which will download and execute code and create persistent services. Before installing: 1) Inspect the repository and docker-compose.yml (images used, volume mounts, environment variables, network settings). 2) Confirm Docker images come from trusted registries and do not mount sensitive host paths. 3) Check where the service stores the Fernet key and any API keys; avoid supplying high-privilege credentials until you review storage/encryption. 4) Run initially in an isolated environment (VM or dedicated host) and restrict network access (firewall) to avoid unintended exposure. 5) Prefer using local-only providers (Ollama) if you want to avoid sending data/keys to external LLM services. If you cannot inspect the repo or are uncomfortable with these risks, do not run docker-compose for this project.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk9768vnft99scmdxsbrr5px675837b35

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binsdocker
Any binffmpeg

SKILL.md

YouTube Model Feeder

Food for your model.

Stop pausing videos every 30 seconds to screenshot, paste into Obsidian, and caption. Every 20-minute tutorial shouldn't take an hour to document.

YouTube Model Feeder extracts everything from a YouTube video — timestamped transcript, key frame snapshots, OCR of code and slides, presentation slide detection, and LLM-generated summaries — and packages it into structured knowledge your AI assistant can search, reference, and reason about.

Why This Exists

The problem isn't transcription — ten tools do that. The problem is structured context. When you feed a raw transcript to a model, it has no visual context. It doesn't know what was on screen when the speaker said "as you can see here." It can't read the code in the terminal, the diagram on the slide, or the config file being edited.

YouTube Model Feeder captures all of that. The output isn't just text — it's a knowledge bundle: transcript segments aligned to timestamps, screenshots of every key moment, OCR text from code snippets and slides, and an LLM summary that ties it all together.

Combined with obsidian-semantic-search (also on ClawHub), every video you watch becomes permanently searchable by meaning in your Obsidian vault.

What It Extracts

Full Pipeline

StepToolWhat it produces
Downloadyt-dlpVideo + audio + metadata (title, duration, thumbnail)
TranscribeWhisper (Ollama) or YouTube captionsTimestamped transcript segments
Frame ExtractionFFmpegKey frame snapshots every 5s (configurable)
Slide DetectionSSIM analysis (OpenCV)Identifies presentation slides via structural similarity between frames
OCRTesseractReads code, terminal output, and text from captured frames
LLM SummaryOllama / OpenAI / AnthropicStructured markdown with sections, code blocks, and key takeaways

Slide Detection (Deep)

Not just frame captures — intelligent slide boundary detection:

  1. Layout detection — classifies video as full-frame, picture-in-picture, or split panel
  2. SSIM transition scan — compares consecutive frames for structural changes (threshold: SSIM < 0.85)
  3. LLM disambiguation — borderline transitions (0.85–0.93 SSIM) sent to LLM for classification
  4. Slide grouping — merges transitions into slides with enforced minimum duration (3s)
  5. Final-state capture — saves the last frame of each slide as JPEG
  6. OCR extraction — runs Tesseract on each slide image
  7. Transcript alignment — maps transcript segments to slide time ranges

Output Formats

FormatWhat you get
MarkdownTimestamped sections with headings, code blocks, image references
HTMLStyled single-page doc with embedded screenshots
Obsidian bundleZIP export: markdown + images, ready to drop into your vault

Installation

Prerequisites

# macOS
brew install ffmpeg tesseract

# Linux
apt install ffmpeg tesseract-ocr

Docker Desktop must be running for the full backend.

Start the Stack

git clone https://github.com/celstnblacc/youtube-model-feeder.git
cd youtube-model-feeder
docker-compose up -d

This starts 5 services:

ServicePortPurpose
api8000FastAPI backend + Swagger docs at /docs
celery_workerBackground video processing
postgres5432Job tracking, transcripts, documents
redis6379Task queue (Celery broker)
web3000Next.js frontend (optional)

Verify

Open http://localhost:8000/docs — you should see the Swagger API documentation.

Usage

Via AI Assistant

Extract a video:

"Extract everything from this YouTube video and save it to my vault: https://youtube.com/watch?v=..."

Transcript only:

"Get the timestamped transcript for this video"

Slides and code screenshots:

"Extract all the code screenshots and presentation slides from this tutorial"

Obsidian export:

"Convert this video into an Obsidian note with screenshots and timestamps"

Via API

# Submit a video for processing
curl -X POST http://localhost:8000/jobs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'

# Check job status
curl http://localhost:8000/jobs/{job_id}

# Get the generated document
curl http://localhost:8000/videos/{video_id}

Via Web UI

Open http://localhost:3000, paste a YouTube URL, and watch the extraction happen in real time with progress tracking.

LLM Provider Selection

Per-user configuration — choose your summarization engine:

ProviderModel (default)SetupCost
Ollama (default)Mistral 7BPre-installed locallyFree
OpenAIGPT-4o-miniSet OPENAI_API_KEYPer-token
AnthropicClaude Sonnet 4.6Set ANTHROPIC_API_KEYPer-token

Configure via the API: PATCH /settings/me with your preferred provider and API key (encrypted at rest with Fernet).

The Knowledge Pipeline

YouTube Model Feeder is designed to work with other ClawHub skills:

YouTube Video
  → youtube-model-feeder (extract transcript + snapshots + OCR + summary)
    → Obsidian vault (structured markdown + images)
      → obsidian-semantic-search (index with embeddings, search by meaning)
        → Your AI assistant (full context from every video you've ever watched)

Every video becomes permanent, searchable knowledge. Not buried in a playlist — indexed and queryable.

Architecture

youtube-model-feeder/
├── backend/           # FastAPI + Celery (Python 3.12)
│   ├── app/
│   │   ├── services/  # Core extraction pipeline (6372 LOC)
│   │   │   ├── youtube.py          # yt-dlp wrapper
│   │   │   ├── transcript.py       # Whisper + YouTube captions
│   │   │   ├── snapshot.py         # FFmpeg frame extraction
│   │   │   ├── slide_detection.py  # SSIM + OCR + alignment (643 LOC)
│   │   │   ├── llm.py             # Summarization engine (1229 LOC)
│   │   │   └── llm_providers.py   # Ollama/OpenAI/Anthropic abstraction
│   │   ├── routes/    # REST API endpoints
│   │   └── tasks.py   # Celery background jobs
├── frontend/          # Next.js 14 + React 18 + Tailwind + Zustand
├── docker-compose.yml # 5 services
└── migrations/        # Alembic database migrations

Troubleshooting

ProblemFix
VirusTotal "suspicious" warning on installFalse positive — skill describes video extraction patterns. Use clawhub install youtube-model-feeder --force
Docker services won't startEnsure Docker Desktop is running. Check docker-compose logs api for errors
Transcription is slowFirst run pulls the Whisper model (~1.5 GB). Subsequent runs are fast. Try YouTube captions first (faster, no model needed)
No slides detectedSSIM threshold may need tuning for your video type. Presentation-style videos work best
LLM summary is emptyCheck LLM provider config. Default is Ollama — ensure Ollama is running with a model pulled
FFmpeg not foundbrew install ffmpeg (macOS) or apt install ffmpeg (Linux)

Links


Built by celstnblacc — food for your model. 226 tests, 6 extraction stages, 3 LLM providers, Obsidian-ready output.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…