Aliyun Qwen Asr
v1.0.0Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`)....
MIT-0
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The name, description, endpoints, and bundled script all align with a non-realtime Alibaba Cloud Qwen ASR transcription skill (sync and async flows). However, the skill metadata declares no required environment variables or primary credential even though the SKILL.md and script require a DASHSCOPE_API_KEY (or credentials file). This omission is an inconsistency.
Instruction Scope
The SKILL.md and the script instruct the agent to read/save files under output/aliyun-qwen-asr and to use DASHSCOPE_API_KEY or ~/.alibabacloud/credentials. The script additionally loads .env from the current working directory and from a repo root discovered by searching parent directories for a .git folder, and it will inject any key=value pairs into the process environment if not already present. That behavior is broader than the SKILL.md explicitly documents and could cause unrelated local secrets to be read into the environment.
Install Mechanism
This is an instruction-only skill with a Python helper script and no install spec. No external archives or installers are fetched by the skill itself, which keeps install risk low.
Credentials
Although asking for a DashScope/Alibaba API key is appropriate for the stated purpose, the skill's registry metadata does not declare DASHSCOPE_API_KEY or the local credentials path as required. The script also reads arbitrary .env files and will set environment variables from them; that expands the effective scope of secrets accessed beyond the single API key. The code also honors ALIBABA_CLOUD_PROFILE/ALICLOUD_PROFILE environment variables, which is reasonable, but again not declared in metadata.
Persistence & Privilege
always:false (no forced always-on presence). The skill does not request to modify other skills or system-wide agent configs. It writes outputs to an output/ directory (documented).
What to consider before installing
This skill mostly does what it says (transcribe audio via Alibaba Cloud Qwen ASR), but there are a few red flags you should consider before installing or running it:
- Missing declared credential: The registry metadata does not list DASHSCOPE_API_KEY (or a primary credential), yet both SKILL.md and the script require it or a ~/.alibabacloud/credentials entry. Treat this as an omission and ensure you supply only a scoped API key.
- .env and repo-dotenv loading: The script will load .env from the current working directory and from a repository root discovered by searching upward for a .git directory, and will populate environment variables for any key=value lines. That can inadvertently read other secrets (database passwords, other API keys). Before running, inspect any .env files in the project and your repo root, or run in an isolated environment with controlled .env.
- Review ~/.alibabacloud/credentials: The script will read this file to extract dashscope_api_key. If you keep multiple credentials or sensitive tokens in that file, consider creating a dedicated profile with only the ASR key.
- Run in a disposable virtualenv/container: Use the suggested venv and run validation (the provided py_compile check) in an isolated environment first. Consider running on a machine that does not contain unrelated secrets.
- Audit the code: The Python helper is short and straightforward; scan it yourself (or have someone you trust do so) before trusting it with private audio.
- Operational precautions: Use a least-privilege DASHSCOPE_API_KEY, set OUTPUT_DIR to a safe location, and avoid running the script from repositories that have sensitive .env files unless you explicitly control them.
Given these inconsistencies (metadata vs runtime behavior) I mark the skill as suspicious rather than benign. If the author updates the metadata to declare required env vars and documents the .env loading behavior, and you confirm the script only reads intended files, the risk would be reduced.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Category: provider
Model Studio Qwen ASR (Non-Realtime)
Validation
mkdir -p output/aliyun-qwen-asr
python -m py_compile skills/ai/audio/aliyun-qwen-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/aliyun-qwen-asr/validate.txt
Pass criteria: command exits 0 and output/aliyun-qwen-asr/validate.txt is generated.
Output And Evidence
- Store transcripts and API responses under
output/aliyun-qwen-asr/. - Keep one command log or sample response per run.
Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.
Critical model names
Use one of these exact model strings:
qwen3-asr-flashqwen3-asr-flash-2026-02-10qwen-audio-asrqwen3-asr-flash-filetransqwen3-asr-flash-filetrans-2025-11-17
Selection guidance:
- Use
qwen3-asr-flash,qwen3-asr-flash-2026-02-10, orqwen-audio-asrfor short/normal recordings (sync). - Use
qwen3-asr-flash-filetransorqwen3-asr-flash-filetrans-2025-11-17for long-file transcription (async task workflow).
Prerequisites
- Install SDK dependencies (script uses Python stdlib only):
python3 -m venv .venv
. .venv/bin/activate
- Set
DASHSCOPE_API_KEYin environment, or adddashscope_api_keyto~/.alibabacloud/credentials.
Normalized interface (asr.transcribe)
Request
audio(string, required): public URL or local file path.model(string, optional): defaultqwen3-asr-flash.language_hints(array<string>, optional): e.g.zh,en.sample_rate(number, optional)vocabulary_id(string, optional)disfluency_removal_enabled(bool, optional)timestamp_granularities(array<string>, optional): e.g.sentence.async(bool, optional): default false for sync models, true forqwen3-asr-flash-filetrans.
Response
text(string): normalized transcript text.task_id(string, optional): present for async submission.status(string):SUCCEEDEDor submission status.raw(object): original API response.
Quick start (official HTTP API)
Sync transcription (OpenAI-compatible protocol):
curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-asr-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
]
}
],
"stream": false,
"asr_options": {
"enable_itn": false
}
}'
Async long-file transcription (DashScope protocol):
curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'X-DashScope-Async: enable' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}'
Poll task result:
curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Local helper script
Use the bundled script for URL/local-file input and optional async polling:
python skills/ai/audio/aliyun-qwen-asr/scripts/transcribe_audio.py \
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
--model qwen3-asr-flash \
--language-hints zh,en \
--print-response
Long-file mode:
python skills/ai/audio/aliyun-qwen-asr/scripts/transcribe_audio.py \
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
--model qwen3-asr-flash-filetrans \
--async \
--wait
Operational guidance
- For local files, use
input_audio.data(data URI) when direct URL is unavailable. - Keep
language_hintsminimal to reduce recognition ambiguity. - For async tasks, use 5-20s polling interval with max retry guard.
- Save normalized outputs under
output/aliyun-qwen-asr/transcripts/.
Output location
- Default output:
output/aliyun-qwen-asr/transcripts/ - Override base dir with
OUTPUT_DIR.
Workflow
- Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
- Run one minimal read-only query first to verify connectivity and permissions.
- Execute the target operation with explicit parameters and bounded scope.
- Verify results and save output/evidence files.
References
references/api_reference.mdreferences/sources.md- Realtime synthesis is provided by
skills/ai/audio/aliyun-qwen-tts-realtime/.
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
