Install
openclaw skills install oasis-audioOasis Audio is an AI audio narration generator that transforms your current state of mind, content you want to digest, or recent life events into a personalized audio with BGM. It supports Chinese & English, 9 reference modes (Soul Healing, Daily Briefing, Knowledge Deep Dive, Content Digest, Bedtime Radio, Language Learning, Conversation Extension, Topic Tracker, Study Buddy), and custom audio profiles when no template fits. Use this skill when the user asks to make audio, generate a podcast, turn notes or long content into audio, or when an audio version would add value. Context collection runs locally via `context_collector.py`, and only the final composed prompt is sent to the xplai.ai API (https://eagle-api.xplai.ai) — raw conversation history, session files, and USER.md content are not transmitted externally. If the AI judges that the final prompt may be sensitive, the skill shows a sanitized preview and requires a second explicit confirmation before any API request.
openclaw skills install oasis-audioAI audio narration generation via xplai.ai with local-only context processing and sensitive-preview confirmation. Zero Config. No API Key required.
Official website: www.xplai.ai
When calling this skill, ALWAYS spawn a subagent to poll for results. After calling xplai_gen_audio.py, wait 210 seconds before the first status check, then poll with xplai_status.py once every 60 seconds, maximum 10 checks. Stop polling when status becomes v_succ or v_fail. Report the final result automatically.
When the user explicitly asks to generate audio, proceed directly with the full generation pipeline, infer all necessary parameters (mode, tone, depth, voice) from conversation context and user profile.
First-use authorization: Before the first real send, xplai_gen_audio.py prints a one-time authorization notice and requires --acknowledge-consent to persist the user's approval locally. The notice should clearly explain, in calm and gentle language, that the skill may read ~/.qclaw/... or ~/.openclaw/... local history to produce personalized audio. However, only the composed prompt is sent to https://eagle-api.xplai.ai, and that any detected sensitive information will be blocked until the user explicitly confirms.
Sensitive information protection: The calling AI should make its own conservative judgment about whether the composed prompt may contain sensitive information. If the composed text is sensitive, show the sanitized preview text to the user first, pause for explicit confirmation, and do not send anything until the user confirms. Use ./xplai_gen_audio.py --dry-run ... to preview manually, or rely on the built-in sensitive preview shown by xplai_gen_audio.py. Only resume with ./xplai_gen_audio.py --allow-sensitive ... after the user explicitly confirms.
Only ask the user a question outside this sensitive-content checkpoint if critical information is truly unresolvable from context (e.g., the request is entirely ambiguous with no conversation history available).
The user's actual need always takes priority over predefined modes. Before generating audio, infer the user's true need along 3 dimensions:
| User says | Literal reading | Deeper need |
|---|---|---|
| "最近好焦虑" | Anti-anxiety content | Something that addresses their specific anxiety source (found in context), not generic meditation |
| "帮我做个关于咖啡的音频" | Coffee knowledge | Calibrated to what they already know (beginner vs. expert, found in context) |
Custom Mode: When no predefined mode fits, create a custom audio profile: name it descriptively (e.g., "赶完DDL后的温柔复盘"), define content structure based on inferred need, and set voice/pacing to match.
For the 9 predefined audio modes (Soul Healing, Daily Briefing, Knowledge Deep Dive, Content Digest, Bedtime Radio, Language Learning, Conversation Extension, Topic Tracker, Study Buddy), read audio_modes.md for triggers, durations, and suggestions.
Mine conversation history to personalize audio. If any step yields no results, skip to text preparation without personalization — do NOT fabricate context.
Auto-detect by checking which default roots have files: ~/.qclaw/, ~/.openclaw/ → pick the one with the most recently modified session file. If none exist, skip personalization.
Classify into exactly ONE scene type:
| Scene | When to Apply | Search Action | Days |
|---|---|---|---|
event | Specific event (finished DDL, got promoted) | Full story extraction | 3 |
emotion_only | Mood without event ("感觉很丧") | High-emotion fragments | 3 |
future | Upcoming plans/worries ("明天面试") | Preparation context | 7 |
long_term | Ongoing state ("一直加班") | Recurring topics | 30 |
interest | Hobby/knowledge topic ("咖啡豆科普") | Cognition level check | 14 |
functional | Pure utility (white noise, pomodoro) | SKIP | — |
no_context | No personal angle / first interaction | SKIP | — |
sensitive | Health, finances, relationships, legal | Emotion tone ONLY, never quote specifics | 3 |
weekly_review | Recap of past week | Multi-topic extraction | 7 |
Generate keywords in 3 layers: Direct (core topic) → Behavior (related actions) → Emotion (emotional signals). Combine into comma-separated string.
python3 context_collector.py --source-tool <tool> --keywords "<kw1,kw2,...>" --days <N> --max-results 20
Output: JSON with fragments, daily_memories, and user_profile (structured fields: name, mbti, interests, notes).
Error handling: If script fails, skip personalization and generate generic audio. Do NOT retry or debug during generation.
Apply semantic filtering by scene type. Discard irrelevant matches. Keep the 3-5 most relevant fragments.
| Scene | What to Extract |
|---|---|
event | Event → Process → Emotion arc → Current state |
emotion_only | Emotional background themes |
future | Preparation activities, specific worries |
long_term | Recurring topics → "daily portrait" |
interest | Prior knowledge → depth level |
sensitive | Emotional tone ONLY — NEVER quote specifics |
weekly_review | Topics → Progress → Emotional highlights → Patterns |
Compress into ~300-500 character summary. Read naturally, focus on tailored details, never feel surveillance-like. If nothing matched, proceed without personalization.
After context collection, compose a structured Audio Brief covering 7 layers: Content Structure, Voice & Delivery, Voice Selection, Personalization Anchors, Emotional Arc, Content Enrichment, Format & Pacing. Then distill into the final prompt. Read text_architecture.md for the full 7-layer framework, prompt structure, role design, and example prompt.
./xplai_gen_audio.py --voice-id "<voice_id>" "<composed prompt>"
Keep prompt under 800 characters (Chinese) or 1200 words (English). For weekly_review, up to 1000 characters.
xplai_gen_audio.py./xplai_gen_audio.py [--voice-id <voice_id>] [--dry-run] [--audit] [--acknowledge-consent] [--allow-sensitive] <text>
text — Composed prompt text--voice-id — Voice selection (see text_architecture.md Layer 3)--dry-run — Preview sanitized prompt without sending to API--audit — Write the final sent prompt and request outcome to local audit.log (off by default)--acknowledge-consent — Persist the first-use authorization notice locally and continue--allow-sensitive — Only use after the user explicitly confirms that the detected sensitive content preview may be sentOutput: Audio ID for status polling. Format: MP3, single-narrator monologue with BGM, 8-20 min, ~4-5 min generation time.
context_collector.pypython3 context_collector.py --source-tool <qclaw|openclaw> --keywords "kw1,kw2" --days <N> --max-results 20
Output: JSON with fragments, daily_memories, user_profile (structured fields only).
xplai_status.py./xplai_status.py <audio_id>
init - Request just submittedq_proc - Content is being processedq_succ - Content processing completedq_fail - Content processing failedv_proc - Audio is in generation queuev_succ - Audio generated successfullyv_fail - Audio generation failedNote: Status codes use the
v_prefix because xplai's API uses "video" nomenclature internally for all media types, including audio-only content.
This skill reads local conversation history from the default OpenClaw/QClaw roots (~/.qclaw/, ~/.openclaw/) via context_collector.py. At runtime it looks inside the built-in subpaths agents/main/sessions, workspace/memory, and workspace/USER.md when they exist. These are default lookup paths rather than required user config. All data access is local and read-only — no source files are modified, created, or deleted.
Why conversation history? The skill searches recent conversations for keywords related to the user's audio request, extracting emotional tone, topics, and context. This enables personalized audio — e.g., referencing a stressful week the user had, rather than generating generic content. Only 3-5 short fragments are selected; the rest are discarded in memory.
Why USER.md? The user profile provides the user's preferred name (to address them personally in the audio), personality type (to match tone), and interests (to enrich content with cross-domain connections). If USER.md is absent, the skill proceeds without personalization.
context_collector.py runs entirely on your machine. Conversation fragments are searched, filtered, and summarized locally.payload. The prompt contains inferred themes and tones, not verbatim conversation excerpts.| Service | Purpose | Data Sent | Endpoint |
|---|---|---|---|
| xplai.ai | Audio generation | Composed text prompt only (max ~1000 chars) | HTTPS API |
No other external services, analytics, or telemetry are used.
audit.log. If --audit is explicitly enabled, xplai_gen_audio.py may append the outbound prompt and request outcome to local audit.log for traceability. If sensitive content is detected, the user must first review the preview text and explicitly confirm before either logging or sending. All other intermediate results (fragments, summaries) exist only in memory during execution.On the first real send, xplai_gen_audio.py should present a one-time authorization note before continuing. Recommended wording:
在真正发出第一条请求之前,先把边界说清楚!为了让这段音频更贴近你,我会看看你存在openclaw/qclaw的会话记录、memory 和
USER.md哦。但是,我不会改任何东西,只会把整理好的请求文本(不超过1000字)发送到xplai音视频平台(https://eagle-api.xplai.ai);生成完成后,你也可以在 xplai 网页在线查看结果~ \n 如果系统判断有敏感信息,我会先给你看脱敏后的预览,等你点确认再发出去~ \n 若你接受这条边界,我们现在就为你生成专属音频啦!后续不会再反复询问这个权限请求~
For conversations classified as sensitive (health, finances, relationships, legal), the skill extracts emotional tone only — specific details are never quoted, summarized, or included in the audio prompt. See the "Scene Classification" section for details.
xplai_gen_audio.py also performs heuristic checks before sending or logging. In addition, the calling AI should proactively judge whether the content may be sensitive based on context, even if no heuristic rule fires. Treat the prompt as sensitive if it appears to contain:
If any of those checks trigger, or if the AI judges there is a meaningful chance that the content is sensitive, stop, show the preview text to the user, and confirm with the user before using --allow-sensitive. Even after confirmation, hard secrets such as tokens, passwords, private keys, and similar credential material must still be redacted before transmission or audit logging. Writing to audit.log still requires the separate --audit flag.