midasheng-audio-text-distance

v1.0.0

Multilingual audio-text retrieval and classification using GLAP (General Language Audio Pretraining). Use when user needs to search/match audio files against...

0· 138·0 current·0 all-time
byJunbo Zhang@jimbozhang
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill's name and description claim audio-text retrieval via GLAP and all required artifacts (SKILL.md examples and scripts/audiosearch.py) perform exactly that against the Xiaomi llmplus.ai.xiaomi.com/dasheng/audio/search endpoint. There are no unrelated binaries, config paths, or credentials requested.
Instruction Scope
Runtime instructions and the included script only read user-supplied audio files and call the documented remote API (and a metrics endpoint for queue status). They do not read arbitrary system files or environment variables beyond what the user supplies. The SKILL.md and script consistently show network calls to the stated endpoint.
Install Mechanism
This is an instruction-only skill with no install spec and a single small Python script; nothing is downloaded or written to disk by an installer, which minimizes install-time risk.
Credentials
No environment variables or credentials are requested (proportionate). However, the skill uploads audio files to a third-party endpoint (llmplus.ai.xiaomi.com) without any authentication in the provided examples, so sensitive audio will be transmitted off-host; users should consider privacy and trust of that endpoint before use.
Persistence & Privilege
always is false, the skill does not request persistent system presence or modify other skills/config; it behaves as a normal, non-persistent, user-invoked utility.
Assessment
This skill appears to do what it says: it uploads audio files to a Xiaomi-hosted GLAP search API and returns similarity/classification results. Before installing or using it, consider: (1) Privacy — audio files are sent to https://llmplus.ai.xiaomi.com with no auth in examples, so do not upload sensitive or proprietary recordings unless you trust the service and its terms; (2) Network usage — the tool requires outbound network access; (3) Sanity check — test with non-sensitive samples first; (4) If you need an on-device alternative or encryption, prefer a local model or an API that supports authenticated, private uploads. The SKILL.md lists curl as a requirement while the script uses Python requests — install either curl (for examples) or ensure Python requests is available to run the included script.

Like a lobster shell, security has layers — review code before you run it.

latestvk977d5mctzmjhv1sg8qk5hzqt98370s9
138downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

midasheng-audio-text-distance

Contrastive Language-Audio Pretraining (GLAP) based service for multilingual audio-text retrieval and classification.

1. Trigger

Use this skill when the user wants to:

  • Match audio files against text descriptions
  • Classify audio content using natural language queries
  • Perform zero-shot audio event detection
  • Search audio by text in any language (supports 50+ languages)

2. API Details

Endpoint: POST https://llmplus.ai.xiaomi.com/dasheng/audio/search (multipart form-data)

Parameters:

  • files: One or more audio files — can specify multiple times
  • text: Comma-separated text descriptions/labels to match against

3. Usage

Basic: Match audio against text labels

curl -X POST "https://llmplus.ai.xiaomi.com/dasheng/audio/search" \
  -F "files=@audio1.mp3" \
  -F "text=Noise,Speech,A person is speaking"

Script usage

python3 scripts/audiosearch.py audio1.mp3 --text "Speech,Music,Noise"
python3 scripts/audiosearch.py --queue   # Check queue status

4. Queue Status(排队情况)

查询命令

python3 scripts/audiosearch.py --queue
# 或直接调 API:
curl -X POST "https://llmplus.ai.xiaomi.com/metrics?path=/dasheng/audio/search"

返回字段

  • active: 当前活跃请求数
  • avg_latency_ms: 平均处理耗时(毫秒)
  • 预估等待时长 = active × avg_latency_ms

何时调用

  1. IM 即将超时但 search 服务还未返回结果时:查排队情况告知用户,请用户稍后来问。
  2. 用户稍后询问任务进度但服务仍未返回时:查最新排队情况返回给用户。

状态分级

  • 🟢 active=0 或预估等待 <5s → 服务空闲
  • 🟡 预估等待 5-30s → 轻微排队
  • 🔴 预估等待 >30s → 排队较长,建议稍后重试

5. Supported Audio Formats

Common formats: mp3, wav, flac, ogg, m4a.

6. Troubleshooting

  • Low scores across all labels: Try broader descriptions
  • API request failed: Verify network connectivity
  • Unsupported format: Convert to mp3 or wav first

Comments

Loading comments...