Install
openclaw skills install youtube-transcript-skillYouTube transcript extraction and content reformatting: given a YouTube video URL, opens the video's transcript panel, extracts all timestamped segments, and transforms the raw transcript into summaries, chapter outlines, Twitter/X threads, blog posts, or notable quotes. Use when the user shares a YouTube URL or video link, asks to summarize a video, get a transcript, extract content from a YouTube video, get YouTube captions, extract YouTube captions, download YouTube captions, transcribe YouTube video, YouTube video to text, make a thread from YouTube, YouTube to blog post, YouTube to article, pull transcript from YouTube, YouTube content extraction, convert YouTube to text, video to transcript. Also applies when user wants to reformat any YouTube video content into structured output (chapters, threads, blog articles, key quotes).
openclaw skills install youtube-transcript-skillYouTube video URL → timestamped transcript → summary / chapters / thread / blog / quotes
All process output to user (progress updates, process notifications) follows the user's language.
Extract the full transcript from a YouTube video's built-in transcript panel, then transform it into the output format the user requests.
https://www.youtube.com/watch?v={VIDEO_ID}If browser-act has been confirmed available in the current session → skip this step.
Invoke browser-act via Skill tool to load usage. If installation or configuration issues arise, follow its guidance to resolve then retry.
This Skill's operational boundary = what the user can manually do in their browser. It only reads data already displayed to the user on the page, never bypassing authentication or access controls. JS code is encapsulated in Python files under the
scripts/directory, invoked viaeval "$(python scripts/xxx.py)". Use the bash tool for execution.
eval "$(python scripts/get-languages.py)"
No parameters. Reads ytInitialPlayerResponse from the current page.
Output example:
{
"available_languages": [
{"code": "en", "name": "English", "kind": "manual", "is_auto": false},
{"code": "en", "name": "English (auto-generated)", "kind": "asr", "is_auto": true}
],
"count": 2
}
Returns {"error": true, "message": "..."} when transcripts are disabled or page is not a YouTube video.
eval "$(python scripts/open-transcript-panel.py)"
No parameters. Clicks the "Show transcript" button below the video (handles multiple UI language variants automatically for robustness).
Must call wait stable after this to allow the panel to fully load.
Output example:
{"success": true, "label": "Show transcript"}
eval "$(python scripts/extract-transcript-segments.py)"
No parameters. Scrolls the open transcript panel to trigger lazy loading for long videos, then extracts all segments.
Output example:
{
"segment_count": 24,
"segments": [
{"ts": "0:18", "text": "We're no strangers to love"},
{"ts": "0:27", "text": "You know the rules and so do I"}
],
"full_text": "We're no strangers to love You know the rules...",
"timestamped_text": "0:18 We're no strangers to love\n0:27 You know the rules..."
}
navigate https://www.youtube.com/watch?v={VIDEO_ID} → wait stableeval "$(python scripts/get-languages.py)" — confirm transcripts are available; note the language listeval "$(python scripts/open-transcript-panel.py)" — open the panelwait stable — wait for panel content to loadeval "$(python scripts/extract-transcript-segments.py)" — extract all segmentsUse timestamped_text from the output as input for the Transform step below.
After fetching the transcript, transform it based on what the user requests. If the user did not specify a format, default to the Full Document — output all five sections in order.
Default Full Document output order (when no specific format is requested):
segment_count >= 1. If empty, tell the user the video has transcripts disabled.full_text exceeds ~50,000 characters, split timestamped_text into overlapping chunks (~40K characters with 2K overlap) and summarize each chunk before merging.timestamped_text field. If no format specified, produce all five sections.0:00 Introduction — host opens with the problem statement
3:45 Background — prior work and why existing solutions fall short
12:20 Core method — walkthrough of the proposed approach
24:10 Results — benchmark comparisons and key takeaways
31:55 Q&A — audience questions on scalability and next steps
1/ Just watched an incredible video on [topic]. Key takeaways 🧵
2/ First insight: [point]. This matters because [reason].
3/ The surprising part: [finding]. Most assume [belief], but this shows otherwise.
4/ Practical takeaway: [action].
5/ Full video: [URL]
get-languages.py returns error; tell user and suggest checking if captions are available on the video pageopen-transcript-panel.py + wait stable + extract-transcript-segments.py oncesegment_count >= 1 AND full_text length > 0
Path: {working-directory}/browser-act-skill-forge-memories/youtube-content-youtube-transcript.memory.md
Before execution: If the file exists, read it first — it records unexpected situations encountered during past executions (e.g., a strategy has become ineffective); adjust strategy order accordingly.
After execution: If an unexpected situation is encountered (strategy became ineffective, page redesigned, anti-scraping upgraded, better path discovered), append a line:
{YYYY-MM-DD}: {what happened} → {conclusion}
Normal execution does not write to the file.