ControlFoley Audio Generator

v1.0.8

A multi-functional audio generation tool for SFX generation, video-to-audio and text-to-audio. 多功能音频生成工具,集成可控视频生成音频、文本生成音频等功能.

2· 181·0 current·0 all-time
byJianxuan Yang@yjx-research

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yjx-research/controlfoley-audio-generator.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "ControlFoley Audio Generator" (yjx-research/controlfoley-audio-generator) from ClawHub.
Skill page: https://clawhub.ai/yjx-research/controlfoley-audio-generator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install controlfoley-audio-generator

ClawHub CLI

Package manager switcher

npx clawhub@latest install controlfoley-audio-generator
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill's name/description (audio SFX, V2A, T2A) align with the included script and API references. Minor inconsistency: the registry metadata lists no required binaries, but SKILL.md and scripts rely on python3 and call curl (subprocess). SKILL.md also mentions ffmpeg as optional. These binaries are reasonable for the stated purpose but should be declared in metadata.
Instruction Scope
SKILL.md and scripts limit actions to submitting tasks to the specified API, polling for status, downloading results, and writing outputs to the chosen output directory. The code checks input file existence and does not read unrelated system files or environment variables. The main runtime behavior is uploading user-provided media/text to the remote API and saving returned files.
Install Mechanism
There is no install spec; this is an instruction-only skill with a bundled Python script. No installers, third-party packages, or arbitrary downloads are performed by the skill itself.
Credentials
The skill declares no environment variables or credentials and the code does not attempt to access secrets. All network communication goes to controlfoley.ai.xiaomi.com (and documented fallback endpoints). No unrelated service credentials are requested.
Persistence & Privilege
The skill is not always-enabled and does not modify other skills or system-wide configuration. It runs on invocation and does not request special persistent privileges.
Assessment
This skill uploads any video, audio, or prompt you provide to a remote service (https://controlfoley.ai.xiaomi.com) and saves returned audio locally. Before installing or using it: (1) confirm you trust the remote endpoint and avoid uploading sensitive or private media, (2) note that the script invokes the local curl binary (and requires python3); SKILL.md also mentions ffmpeg for optional conversions — ensure those tools are installed if needed, (3) review the referenced upstream GitHub/project pages yourself to verify provenance and privacy policy, and (4) if you need an offline or self-hosted workflow, this skill is not suitable because it relies on the remote API.

Like a lobster shell, security has layers — review code before you run it.

latestvk974kh8x969gqwzkb93bhweb0d85d7dr
181downloads
2stars
9versions
Updated 5d ago
v1.0.8
MIT-0

ControlFoley Audio Generator

A multi-functional audio generation tool powered by the ControlFoley model, integrating video sound effect (SFX) generation, video background music composition, text-to-audio and other functions to realize diversified creative audio generation.

This tool supports four modes: Video-to-Audio (V2A), Text-Controlled Video-to-Audio (TC-V2A), Audio-Controlled Video-to-Audio (AC-V2A), and Text-to-Audio (T2A).

Basic Info

FieldValue
Service OperatorXiaomi LLM Plus Team
API Endpointhttps://controlfoley.ai.xiaomi.com
Open Source Repohttps://github.com/xiaomi-research/controlfoley
Project Pagehttps://yjx-research.github.io/ControlFoley_web_page/
Online Demohttps://yjx-research.github.io/ControlFoley_web_page/#try-gen
Model Weightshttps://huggingface.co/YJX-Xiaomi/ControlFoley/
API KeyNot required
Script Pathscripts/foley.py

Prerequisites

python3 --version   # Python 3.x
curl --version      # curl for API submission
ffmpeg -version     # optional, for audio format conversion

Modes

ModeCommandInputOutputDescription
V2Av2a video.mp4Video file.mp4 + .flacGenerate audio matching the video content
TC-V2Av2a video.mp4 --prompt "text"Video + text.mp4 + .flacGenerate audio aligned with text prompts while staying synchronized with the video
AC-V2Av2a video.mp4 --ref-audio ref.wavVideo + reference audio.mp4 + .flacGenerate audio with timbre matching reference audio while staying synchronized with the video
T2At2a "prompt"Text description.flacGenerate audio from text descriptions

Usage (CLI version)

1. Text-to-Audio (T2A, default 8s)

python3 scripts/foley.py t2a "dog barking loudly in a park"

2. Video-to-Audio (V2A)

python3 scripts/foley.py v2a input.mp4

3. Text-Controlled Video-to-Audio (TC-V2A)

python3 scripts/foley.py v2a input.mp4 --prompt "footsteps on gravel with birds chirping"

4. Audio-Controlled Video-to-Audio (AC-V2A)

python3 scripts/foley.py v2a input.mp4 --ref-audio reference.wav

5. Specify duration

python3 scripts/foley.py t2a "A mountain stream murmurs, its gentle current lapping against the pebbles." --duration 15

6. Generate multiple candidates

python3 scripts/foley.py t2a "cat purring softly" --count 3

7. Fixed seed (reproducible results)

python3 scripts/foley.py t2a "rain on a tin roof" --seed 42

8. List available models

python3 scripts/foley.py models

Usage (API version)

POST

curl -X POST "https://controlfoley.ai.xiaomi.com/api/v1/v2a/submit" -F "file=@video_path" -F "prompt=footsteps on gravel with birds chirping"

return

{"taskId": "xxx", "message": "Task submitted successfully"}

GET

1. Available Models

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/models" 

return

{"models":[{"name":"ControlFoley","enabled":true}]}

2. Status Inquiry

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/status/{taskId}" 

return

  1. success:
{"urls":["{Domain name}/ControlFoley_output/{taskId}/{filename}"],"status":"success","done":true}
  1. processing:
{"status":"processing","done":false}
  1. pending:
{"status":"pending","queue_pos":1,"queue_position":1,"total_queue":2,"done":false}

3. Result Download

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/ControlFoley_output/{taskId}/{filename}" --output ./output.flac

4. Status Inquiry & Result Download

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/status_download/{taskId}" --output-dir ./output --output audio.zip

Parameters

T2A (Text-to-Audio)

ParameterDescriptionDefaultExample
promptAudio description text (required)"dog barking in park"
--modelModel IDControlFoley--model ControlFoley
--durationAudio length in seconds (max 30)8--duration 15
--negativeNegative prompt to exclude unwanted sounds--negative "noise, human voice"
--cfgCFG strength — higher = stricter prompt adherence4.5--cfg 6.0
--countNumber of variants to generate (1–5)1--count 3
--seedFixed random seed for reproducibility--seed 42
-o/--outdirOutput directory./output-o ./my_audio

V2A (Video-to-Audio)

ParameterDescriptionDefaultExample
videoInput video path (required)input.mp4
--modelModel IDControlFoley--model ControlFoley
--promptText prompt to guide audio generation (TC-V2A)--prompt "keyboard tapping"
--negativeNegative prompt to exclude unwanted sounds--negative "music, noise"
--ref-audioReference audio file for timbre control (AC-V2A)--ref-audio reference.wav
--cfgCFG strength4.5--cfg 7.0
--countNumber of variants to generate (1–5)1--count 2
--seedFixed random seed (not forwarded to API currently)--seed 42
-o/--outdirOutput directory./output-o ./results

Prompt Tips

  • Be specific: "cat footsteps on wooden floor" beats "cat sound"
  • Use negative prompts: --negative "human voice, music, noise" to filter unwanted audio
  • CFG tuning: high CFG (6.0–7.5) for precise control, low CFG (3.0–4.5) for creative freedom

Output & Post-Processing

  • Audio: .flac (44100 Hz, lossless)
  • Video: .mp4 (original video + generated audio track)
  • Results saved to --outdir, paths printed to stdout

Convert to MP3 for sharing:

ffmpeg -i output.flac -codec:a libmp3lame -qscale:a 2 output.mp3

Error Handling

IssueCauseFix
Internal URL inaccessibleResult URL uses .xiaomi.srv internal domainScript auto-falls back to /api/v1/v2a/ControlFoley_output/{task_id}/{filename}
Queue busyTask is waitingScript auto-polls up to ~5 min; check load via curl $API_BASE/health
Model unavailableModel not enabledRun foley.py models to see available models
Task timeoutService overloadedResubmit the task

API Reference

See ./references/api-reference.md for full endpoint documentation.

⚠️ Privacy & Security

  • Service Operator: Cloud processing is operated by the Xiaomi LLM Plus Team at https://controlfoley.ai.xiaomi.com
  • Data Upload: V2A/TC-V2A/AC-V2A modes upload the full video file to the remote service for processing. Do not upload videos containing sensitive personal or identifiable information
  • Data Processing: Uploaded videos and audio are used solely for audio generation. Results are returned via URL. Refer to the Xiaomi LLM Plus Team's terms of service for data retention and access control policies
  • No API Key Required: The service requires no authentication — please use it responsibly to avoid unnecessary load
  • Recommendation: Before first use, validate with a small, non-sensitive test clip

Comments

Loading comments...