Volcengine Ata Subtitle
v0.1.0Generate subtitles with automatic time alignment using Volcengine ATA API. Use when the user wants to: (1) add time-aligned subtitles to videos, (2) convert...
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The SKILL.md and volc_ata.py consistently implement Volcengine ATA subtitle creation (reading local WAV and text, base64-encoding audio, POSTing to an API, polling, saving SRT/ASS). However the registry metadata lists no required environment variables or primary credential even though both the docs and code expect VOLC_ATA_APP_ID, VOLC_ATA_TOKEN and optionally a config file (~/.volcengine_ata.conf). That mismatch is an incoherence in the manifest.
Instruction Scope
Runtime instructions and the Python code stay within the stated scope: they read the provided audio and text files, optionally read ~/.volcengine_ata.conf or specific env vars, call the ATA API (or produce a local mock subtitle in demo mode), then write an output subtitle file. There are no instructions to read unrelated system files or to transmit data to third parties beyond the configured API endpoint.
Install Mechanism
There is no install spec (instruction-only installation) and the package ships a single Python script. No remote downloads or installers are invoked by the skill itself; this is low-risk from an install mechanism perspective.
Credentials
The skill legitimately needs API credentials (app id and access token) and may use a secret_key in the example config, but the registry metadata does not declare any required env vars or a primary credential. This omission prevents automated gating and review and is a notable manifest inconsistency. Also note the code allows overriding api_base (which controls where the base64 audio+text are POSTed) — if misconfigured to a malicious endpoint it could exfiltrate data.
Persistence & Privilege
The skill does not request always:true or other elevated platform privileges. It reads a per-user config file (~/.volcengine_ata.conf) and writes only the specified output subtitle file; there is no evidence it modifies other skills or system-wide settings.
What to consider before installing
What to consider before installing:
- Manifest mismatch: The registry lists no required environment variables, but SKILL.md and the included Python script expect VOLC_ATA_APP_ID and VOLC_ATA_TOKEN (and optionally a config file with a secret_key). Ask the publisher to update the manifest to declare these secrets so you (or an automated gate) can review them.
- Credential handling: This skill will send your audio (base64) and text to the configured API base. Only provide real API credentials if you trust the Volcengine endpoint (default is https://openspeech.bytedance.com). If you only want to test, use demo mode (omit credentials) which creates mock subtitles locally.
- api_base risk: The code honors an api_base/config override. Do not set api_base to an untrusted host — that would cause the script to POST full audio and text to that host (data exfiltration risk). Prefer leaving api_base at the official Volcengine URL or verify the URL before running.
- Token format oddity: The docs/code expect an Authorization header of the form "Bearer; {token}" (note the semicolon). Confirm with Volcengine docs that this is correct for your account; otherwise you may get auth failures.
- Source verification: The skill's homepage/source are unknown in the registry. If you plan to use it with real credentials, verify the repository origin (review history, publisher, and any published releases) or obtain an official client from Volcengine.
- Practical steps: Inspect volc_ata.py locally, run it in demo mode first, and avoid storing long-lived secrets in world-readable files. If you need to use it in production, request the maintainer to fix the manifest (declare required env vars), and optionally restrict api_base to the official domain in the code or configuration.Like a lobster shell, security has layers — review code before you run it.
latest
Volcengine ATA Subtitle (自动打轴)
Generate subtitles with automatic time alignment using Volcengine's ATA (Automatic Time Alignment) API.
Prerequisites
Set the following environment variables or create a config file:
Option A: Environment Variables
export VOLC_ATA_APP_ID="your-app-id"
export VOLC_ATA_TOKEN="your-access-token"
export VOLC_ATA_API_BASE="https://openspeech.bytedance.com"
Option B: Config File
Create ~/.volcengine_ata.conf:
[credentials]
appid = your-app-id
access_token = your-access-token
secret_key = your-secret-key
[api]
base_url = https://openspeech.bytedance.com
submit_path = /api/v1/vc/ata/submit
query_path = /api/v1/vc/ata/query
Execution (Python CLI Tool)
A Python CLI tool is provided at ~/.openclaw/workspace/skills/volcengine-ata-subtitle/volc_ata.py.
Quick Examples
# Basic usage: audio + text → SRT subtitle
python3 ~/.openclaw/workspace/skills/volcengine-ata-subtitle/volc_ata.py \
--audio storage/audio.wav \
--text storage/subtitle.txt \
--output storage/subtitles/final.srt
# Specify output format (srt or ass)
python3 ~/.openclaw/workspace/skills/volcengine-ata-subtitle/volc_ata.py \
--audio storage/audio.wav \
--text storage/subtitle.txt \
--output storage/subtitles/final.ass \
--format ass
Input Requirements
Audio File
- Format: WAV (PCM)
- Sample Rate: 16000 Hz (16kHz)
- Channels: 1 (mono)
- Encoding: 16-bit PCM (
pcm_s16le)
Extract from video:
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
Text File
- Format: Plain text (UTF-8)
- Structure: One sentence per line
- No punctuation: ATA will handle automatically
- No timestamps: Pure text only
Example:
主人闹钟没响睡过头了
我们俩轮流用鼻子拱他脸
他以为地震了抱着枕头就跑
Output Formats
SRT (SubRip)
1
00:00:00,000 --> 00:00:02,500
第一句字幕
2
00:00:02,500 --> 00:00:05,000
第二句字幕
ASS (Advanced Substation Alpha)
[Script Info]
Title: ATA Subtitles
ScriptType: v4.00+
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:02.50,Default,,0,0,0,,第一句字幕
Rules
- Always check that credentials are configured before making API calls.
- Audio must be 16kHz mono PCM - convert if necessary with ffmpeg.
- Text should be plain - no timestamps, no punctuation.
- Default format: SRT (most compatible).
- Handle errors gracefully - display clear error messages.
Troubleshooting
Invalid Sample Rate
Error: Invalid sample rate, expected 16000Hz
Fix:
ffmpeg -i input.mp4 -ar 16000 -ac 1 audio.wav
Authorization Failed
Error: Authorization failed
Fix: Check token format. Should be Bearer; {token} (with semicolon).
Related Documents
Comments
Loading comments...
