Audio Summary
Automatically extracts audio from video, transcribes it using qwen3-asr-flash, and generates segmented text summaries saved alongside the original file.
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 167 · 3 current installs · 3 all-time installs
by@alanOO7
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The code does what the name/description claim: it uses ffmpeg to extract/compress audio and calls a qwen3-asr-flash ASR model via the OpenAI Python client. Declared dependencies in SKILL.md (ffmpeg, openai SDK) match the implementation. However, the skill does not declare any required environment variables or primary credential in the registry metadata, yet the script contains a hard-coded API key and a custom base_url — an inconsistency between what the skill claims to require and what it actually contains.
Instruction Scope
Runtime instructions and the script convert entire audio files to a Base64 data URI and send that data to a remote model endpoint. The SKILL.md references the '百炼 API KEY' but does not disclose the actual network endpoint used by the code (the code targets dashscope.aliyuncs.com). Sending full audio data to an undeclared third‑party endpoint is a privacy/exfiltration risk. The instructions also recommend running the exact included script path, which will use the embedded key by default.
Install Mechanism
There is no install spec (instruction-only with a single Python script). That lowers supply-chain risk because nothing will be automatically downloaded or extracted during install.
Credentials
The skill requires an API credential to call the ASR model, but instead of declaring a required env var or asking the user to supply a key, the script hard-codes an API key string and a non-standard base_url. The registry metadata declared no required credentials; embedding a key in the code is disproportionate and insecure. The endpoint in code (dashscope.aliyuncs.com) is not the public qwen/openai domain and is not explained in SKILL.md.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges or modify other skills/config. It runs only when invoked and does not persist configuration beyond writing its own summary output file in the same directory as the input.
Scan Findings in Context
[HARD_CODED_SECRET] unexpected: The Python file contains a hard-coded API key constant (starts with 'sk-76735...'). A transcription skill should accept a user-supplied API key via environment/config rather than bundling one. Hard-coded credentials are insecure and unexpected.
[NON_STANDARD_BASE_URL] unexpected: The OpenAI client is configured with base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1' instead of a documented qwen or openai host. The skills README does not explain this endpoint. Audio data and the embedded API key will be transmitted to this third-party host.
What to consider before installing
This skill appears to implement its advertised audio-extraction and transcription functionality, but it contains a hard-coded API key and sends full Base64-encoded audio to an undocumented third-party endpoint (dashscope.aliyuncs.com). Before installing or running it: (1) Do not run it on private/confidential audio as it will transmit the entire audio to that endpoint. (2) Ask the author to remove the embedded API key and require the user to supply their own key via an environment variable or secure config. (3) Verify the identity/trustworthiness of the endpoint (dashscope.aliyuncs.com) and the ASR provider (qwen3-asr-flash). (4) Prefer a version that documents required env vars and exposes the network destination; optionally modify the script to point to a trusted API host or your own account. If you cannot verify the endpoint or the provenance of the embedded key, treat this skill as high-risk and avoid using it with sensitive data.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
audio-summary Skill
音频/视频转文本总结助手。
功能
- 自动音频提取:使用
ffmpeg从 MP4 等视频文件中提取 16k mono 压缩音频,以适配大模型体积限制。 - 转录转总结:基于百炼
qwen3-asr-flash模型,自动将音频转换为文字并生成内容分段总结。 - 大文件支持:通过 48k 压缩,支持最长约 5-8 分钟的视频单次直接转录。
依赖
ffmpeg(已安装在系统路径)openaiPython SDK (已安装)- 百炼 API KEY (已在脚本中配置为
sk-76735...)
使用方法
从命令行运行
# 对指定视频进行提取和总结
python .openclaw/workspace/skills/audio-summary/audio_summary_skill.py "C:\Path\To\Your\Video.mp4"
文件位置
- 提取出的总结文本将自动保存在视频同级目录下,并命名为
视频名_summary.txt。
注意事项
- 目前单次 Base64 转录限制为 6MB,对于超过 10 分钟的长视频,建议先手动切分或进一步降低码率。
- API 费用按
qwen3-asr-flash模型计费。
Files
2 totalSelect a file
Select a file to preview.
Comments
Loading comments…
