Install
openclaw skills install summarize-anythingExtract, transcribe, clean, segment, and analyze long-form content from URLs, local media files, existing transcripts, and pasted text. Use when a user provides a podcast, video, interview, article, WeChat post, social video link, subtitle file, or transcript and wants: (1) text extraction, (2) whisper-based transcription, (3) cleaned transcripts, (4) rough speaker segmentation based on semantics, or (5) detailed summaries and insight memos.
openclaw skills install summarize-anythingUse this skill to turn long-form content into usable text and high-signal insights. The skill treats Codex as the workflow orchestrator: Codex decides how to acquire the content, when to call local tools, how to clean the transcript, how to assign rough speakers, and how to write the final insight memo.
Classify the input before doing any extraction work. Supported inputs:
txt, srt, or json transcriptPrefer direct text over transcription when possible, but require a full transcript body.
Use local runtime tools for audio workflows.
scripts/.scripts/ensure_whisper_cpp.sh makes sure a local whisper-cli exists inside the skill runtime.scripts/ensure_whisper_model.sh makes sure the requested ggml model exists inside the skill runtime.scripts/extract_audio.sh converts video to wav when needed.scripts/run_whisper_cpp.sh is the default entry point for transcription.scripts/runtime_status.sh reports how much space the runtime currently uses.scripts/maintain_runtime.sh warns or auto-cleans when the runtime grows beyond configured thresholds.scripts/cleanup_runtime.sh clears temporary runtime state and can optionally prune models or build sources.Treat transcript cleanup and speaker assignment as LLM tasks, not hard-rule tasks.
Produce layered outputs instead of a single summary. Recommended artifacts:
Personalize only when there is clear signal.
scripts/run_whisper_cpp.sh as the default transcription entry point.small model unless the user asks for a different latency/quality tradeoff.runtime/ tree when possible.output/summarize-anything/<job-id>/ unless the user requested another path.output/ and outputs/.Read references/transcript-cleaning.md when doing substantial cleanup.
Read references/speaker-segmentation.md for the segmentation rubric.
references/insight-template.md.总摘要 or equivalent top-level synthesis section before the detailed breakdown.总摘要按内容结构详细总结 or another natural equivalent最核心的观点, 最有价值的地方, or 值得注意的局限1800-3000 characters when writing in Chinese, Japanese, or similarly dense scripts1200-2000 words when writing in English or similarly spaced languagesRead references/insight-template.md before writing the final memo.
When the user does not specify a format, aim for these four deliverables:
Raw acquisition artifact
Cleaned transcript
Rough speaker transcript
Insight memo
For long-form content such as multi-hour podcasts, lectures, or interviews, the default memo should include:
When the source is an interview, podcast, or lecture and the user asks for a detailed summary, prefer this response shape unless the user requests another format:
总摘要For especially long interview-style content, prefer this execution order:
In normal use, present the insight memo directly in the chat response. Use files for transcript artifacts and optional supporting materials, not as a substitute for the main answer.
A multi-speaker media job is not complete if it returns only an insight memo without a cleaned transcript and a rough speaker transcript, unless the user explicitly requested summary-only output.
For long-form content, always do this order:
Never lead with artifact references. Never compress the memo just because files were created. Do not replace the in-chat analysis with a file pointer unless the user explicitly asked for file-only output.
The job is incomplete if the final in-chat answer is primarily:
Creating transcript files does not reduce the obligation to provide a full in-chat analytical memo.
If the user provides only a URL or media file without format instructions:
The following do not count as successful transcript acquisition:
transcriptMediaIdIf only these are available, continue to media download and ASR fallback.
Before finalizing a response for a media URL, verify all of the following:
Before finalizing, verify all of the following:
references/workflow.md for the acquisition decision tree and output policy.references/invocation-examples.md for reusable prompt templates.references/transcript-cleaning.md when cleaning ASR-heavy transcripts.references/speaker-segmentation.md when the content is multi-speaker.references/insight-template.md before writing the final analysis.SKILL.md
agents/openai.yaml
scripts/
references/
runtime/
bin/ for local executablesmodels/ for ggml modelscache/ for downloads and reusable fetchessrc/ for locally built tool sources when neededwork/ for temporary files and smoke-test artifactsruntime/.runtime/, unless the artifact is explicitly internal.runtime/work/ and runtime/cache/ as reclaimable space.scripts/maintain_runtime.sh runs automatically in bootstrap and whisper flows.1536MB and auto-cleans runtime/cache/ and runtime/work/ at 2048MB.CONTENT_INSIGHT_RUNTIME_WARN_MB and CONTENT_INSIGHT_RUNTIME_CLEAN_MB.scripts/runtime_status.sh before or after large jobs when size matters.scripts/cleanup_runtime.sh after large jobs to clear temporary files.