Web Video Transcribe DOCX

v1.0.2

Offline-first workflow for turning Chinese web page video or audio into text and Word deliverables. Use when Codex needs to (1) extract playable media stream...

0· 68·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for c-narcissus/web-video-transcribe-docx.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Web Video Transcribe DOCX" (c-narcissus/web-video-transcribe-docx) from ClawHub.
Skill page: https://clawhub.ai/c-narcissus/web-video-transcribe-docx
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install web-video-transcribe-docx

ClawHub CLI

Package manager switcher

npx clawhub@latest install web-video-transcribe-docx
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill's name and description (extract web media, download streams, run local SenseVoice ASR, produce TXT/DOCX) align with the included scripts. Declared runtime requirement (python) matches the Python scripts. Required actions such as browser automation, media downloading, ffmpeg usage, and model download are expected for this functionality.
Instruction Scope
SKILL.md and the scripts confine behavior to extracting media URLs from pages, downloading media, running local ASR, and producing DOCX. The runtime instructions explicitly state not to request/store cookies or tokens and not to bypass DRM or logins. Extractors capture request headers but sanitize them (only keeping Referer/Origin), and pipelines only operate on user-supplied or page-extracted URLs.
Install Mechanism
No marketplace install spec is present (instruction-only), but the bootstrap script will pip-install several Python packages and can run Playwright's browser installer if invoked. The SenseVoice model is downloaded from a GitHub releases URL and extracted with a safe-path check; these behaviors are appropriate for the task but do involve network downloads and installing Python packages on first run, which is expected but worth noting.
Credentials
The skill requests no environment variables or external credentials. It does look for a local Chrome/Edge executable and writes cache/model files to a per-user cache directory. The only remote endpoint used for code operation is a GitHub releases URL to download the ASR model (appropriate).
Persistence & Privilege
The skill is not 'always: true' and does not claim to modify other skills or global agent settings. It writes files to its own cache and output directories and installs packages only when the bootstrap script is run; this level of presence is appropriate for the described offline transcription workflow.
Assessment
This skill is coherent with its purpose but will: (1) download a large ASR model from GitHub into a user cache the first time you run it, (2) pip-install Python packages if you run scripts/bootstrap_env.py, and (3) use Playwright and a local Chrome/Edge to extract media from pages when needed. Those actions require network access and will write files to your user cache and whatever output directory you choose. If you install or run it: review the bootstrap step before running, run initial tests on non-sensitive/public pages, consider using an isolated environment (virtualenv/container), and inspect the small agents/openai.yaml file if you want to confirm there are no hidden external endpoints or API keys. The skill explicitly avoids capturing cookies/tokens and uses a safe tar extraction check for the model archive.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Any binpython, python3, py
latestvk9733hxecesfv9jgdqjqbjj36h850fnw
68downloads
0stars
1versions
Updated 1w ago
v1.0.2
MIT-0

Web Video Transcribe Docx

Overview

Use the bundled scripts to extract media, download it, transcribe it offline, and render DOCX output.

Prefer the deterministic scripts before hand-rolling new code.

Use {baseDir} when constructing file paths inside this skill so the instructions stay portable across agents and marketplaces.

Environment

  • Require Python 3 and local filesystem access.
  • Require network access for first-run model download and for fetching page or media URLs.
  • Require a local Chrome or Edge browser only when extracting media from a web page.

Quick Start

  1. Run python {baseDir}/scripts/bootstrap_env.py once in the target environment.
  2. For a generic web page URL, run python {baseDir}/scripts/pipeline_web_to_docx.py <url> --output-dir <dir>.
  3. For a direct media URL, run python {baseDir}/scripts/download_url.py <url> <output> and then python {baseDir}/scripts/transcribe_sensevoice.py --input <file> --output-txt <txt> --output-docx <docx>.
  4. For a local media file, run python {baseDir}/scripts/transcribe_sensevoice.py --input <file> --output-txt <txt> --output-docx <docx>.
  5. If the user asks for a polished reading version rather than a raw transcript, read references/cleanup-guidelines.md, produce a refined .txt, and then render it with python {baseDir}/scripts/transcript_to_docx.py.

Example Requests

  • "Transcribe the Chinese audio from this web video and export it as a Word document."
  • "Turn this MP4 into a transcript, then reorganize it into chaptered reading notes."
  • "This page needs a Referer header for media download. Extract the media stream and convert it to DOCX."

Workflow

1. Classify the source

  • Generic page URL: Use python {baseDir}/scripts/pipeline_web_to_docx.py first. If the page is especially stubborn and it is a Toutiao page, python {baseDir}/scripts/pipeline_toutiao_to_docx.py and python {baseDir}/scripts/extract_toutiao_media.py remain available as site-specific fallbacks.
  • Direct media URL: Use python {baseDir}/scripts/download_url.py, then transcribe.
  • Local file: Transcribe directly.

2. Preserve raw outputs

  • Keep the raw transcript as its own .txt.
  • If you produce a cleaned or chapterized version, save it as a separate file.
  • Do not overwrite the raw transcript unless the user explicitly asks.

3. Prefer the audio stream

  • If a page exposes a dedicated audio stream, prefer downloading that instead of the full video stream.
  • If the page only exposes a video stream, let ffmpeg decode audio during transcription.
  • If the page exposes HLS or DASH manifests, prefer downloading them through the bundled downloader or pipeline instead of raw HTTP GET.

4. Refine conservatively

  • Preserve meaning.
  • Fix obvious ASR mistakes, punctuation, paragraph breaks, headings, and chapter boundaries.
  • Do not invent quotes or historical claims that are not supported by the transcript.
  • If a passage is too noisy to restore confidently, keep it neutral instead of fabricating detail.

5. Stay within scope

  • Only download URLs that the user supplied directly or that the extractor captured from the target page.
  • Do not request, store, or exfiltrate cookies, access tokens, or account credentials.
  • Do not attempt to bypass DRM, login walls, or geo-restriction controls.
  • If a page requires authenticated browser state that is not already available, say so plainly and stop at the supported boundary.

6. Render deliverables

  • Use python {baseDir}/scripts/transcript_to_docx.py for generic TXT-to-DOCX rendering.
  • Use the raw transcript for auditability and the refined transcript for reading quality.

Scripts

  • scripts/bootstrap_env.py Install or verify the Python packages used by this skill.
  • scripts/extract_web_media.py Open a generic web page in a real browser, capture likely media URLs plus common download headers, and emit a JSON manifest.
  • scripts/extract_toutiao_media.py Open a Toutiao page in a real browser, capture audio/video URLs, and emit a JSON manifest with the same schema as the generic extractor.
  • scripts/download_url.py Download a direct media URL to disk with a stable user agent, optional headers, and HLS/DASH handling.
  • scripts/transcribe_sensevoice.py Download the SenseVoice model on demand, segment media, run offline ASR, and emit TXT and optional DOCX.
  • scripts/transcript_to_docx.py Render timestamped transcripts or chapterized notes into a Word document.
  • scripts/pipeline_web_to_docx.py Run the generic end-to-end pipeline: extract, download, transcribe, and render.
  • scripts/pipeline_toutiao_to_docx.py Run the Toutiao-specialized end-to-end pipeline for cases where the generic extractor is not preferred.

References

Validation

  • Run python {baseDir}/scripts/bootstrap_env.py before first use in a fresh environment.
  • Validate the skill folder with skill-creator/scripts/quick_validate.py.
  • Prefer testing --help and one representative happy path after changing the scripts.
  • If extraction fails on a page, capture a direct media URL with browser tooling and continue with the downloader + transcriber.
  • Do not promise support for DRM-protected streams, authenticated cookies, or sites that only expose encrypted EME playback.

Comments

Loading comments...