midasheng-audio-text-distance

v1.0.0

Multilingual audio-text retrieval and classification using GLAP (General Language Audio Pretraining). Use when user needs to search/match audio files against...

⭐ 0· 138·0 current·0 all-time

byJunbo Zhang@jimbozhang

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The skill's name and description claim audio-text retrieval via GLAP and all required artifacts (SKILL.md examples and scripts/audiosearch.py) perform exactly that against the Xiaomi llmplus.ai.xiaomi.com/dasheng/audio/search endpoint. There are no unrelated binaries, config paths, or credentials requested.

✓

Instruction Scope

Runtime instructions and the included script only read user-supplied audio files and call the documented remote API (and a metrics endpoint for queue status). They do not read arbitrary system files or environment variables beyond what the user supplies. The SKILL.md and script consistently show network calls to the stated endpoint.

✓

Install Mechanism

This is an instruction-only skill with no install spec and a single small Python script; nothing is downloaded or written to disk by an installer, which minimizes install-time risk.

ℹ

Credentials

No environment variables or credentials are requested (proportionate). However, the skill uploads audio files to a third-party endpoint (llmplus.ai.xiaomi.com) without any authentication in the provided examples, so sensitive audio will be transmitted off-host; users should consider privacy and trust of that endpoint before use.

✓

Persistence & Privilege

always is false, the skill does not request persistent system presence or modify other skills/config; it behaves as a normal, non-persistent, user-invoked utility.

Assessment

This skill appears to do what it says: it uploads audio files to a Xiaomi-hosted GLAP search API and returns similarity/classification results. Before installing or using it, consider: (1) Privacy — audio files are sent to https://llmplus.ai.xiaomi.com with no auth in examples, so do not upload sensitive or proprietary recordings unless you trust the service and its terms; (2) Network usage — the tool requires outbound network access; (3) Sanity check — test with non-sensitive samples first; (4) If you need an on-device alternative or encryption, prefer a local model or an API that supports authenticated, private uploads. The SKILL.md lists curl as a requirement while the script uses Python requests — install either curl (for examples) or ensure Python requests is available to run the included script.

Like a lobster shell, security has layers — review code before you run it.

latestvk977d5mctzmjhv1sg8qk5hzqt98370s9

138downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

midasheng-audio-text-distance

Contrastive Language-Audio Pretraining (GLAP) based service for multilingual audio-text retrieval and classification.

1. Trigger

Use this skill when the user wants to:

Match audio files against text descriptions
Classify audio content using natural language queries
Perform zero-shot audio event detection
Search audio by text in any language (supports 50+ languages)

2. API Details

Endpoint: POST https://llmplus.ai.xiaomi.com/dasheng/audio/search (multipart form-data)

Parameters:

files: One or more audio files — can specify multiple times
text: Comma-separated text descriptions/labels to match against

3. Usage

Basic: Match audio against text labels

curl -X POST "https://llmplus.ai.xiaomi.com/dasheng/audio/search" \
  -F "files=@audio1.mp3" \
  -F "text=Noise,Speech,A person is speaking"

Script usage

python3 scripts/audiosearch.py audio1.mp3 --text "Speech,Music,Noise"
python3 scripts/audiosearch.py --queue   # Check queue status

4. Queue Status（排队情况）

查询命令

python3 scripts/audiosearch.py --queue
# 或直接调 API：
curl -X POST "https://llmplus.ai.xiaomi.com/metrics?path=/dasheng/audio/search"

返回字段

active: 当前活跃请求数
avg_latency_ms: 平均处理耗时（毫秒）
预估等待时长 = active × avg_latency_ms

何时调用

IM 即将超时但 search 服务还未返回结果时：查排队情况告知用户，请用户稍后来问。
用户稍后询问任务进度但服务仍未返回时：查最新排队情况返回给用户。

状态分级

🟢 active=0 或预估等待 <5s → 服务空闲
🟡 预估等待 5-30s → 轻微排队
🔴 预估等待 >30s → 排队较长，建议稍后重试

5. Supported Audio Formats

Common formats: mp3, wav, flac, ogg, m4a.

6. Troubleshooting

Low scores across all labels: Try broader descriptions
API request failed: Verify network connectivity
Unsupported format: Convert to mp3 or wav first

Comments

Loading comments...