Podcast Generation with Microsoft Foundry

v0.1.0

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

3· 2.5k·8 current·8 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The skill's name/description (podcast audio via Azure OpenAI Realtime) matches the implementation. However, the SKILL.md and referenced code expect AZURE_OPENAI_AUDIO_API_KEY, AZURE_OPENAI_AUDIO_ENDPOINT, and AZURE_OPENAI_AUDIO_DEPLOYMENT (and a settings object) while the registry metadata lists no required environment variables or config paths. That discrepancy is significant: consumers cannot see from metadata that sensitive credentials are needed.
!
Instruction Scope
The runtime instructions focus on WebSocket to Azure, streaming audio chunks, and PCM→WAV conversion — all within the declared purpose. But the references and example service code also describe fetching content from a database (tags/bookmarks), building prompts from user data, saving audio to a DB, and exposing streaming endpoints. Those broader actions (DB reads/writes and content aggregation) are not surfaced in the skill metadata and expand the scope beyond mere TTS conversion.
Install Mechanism
No install specification is provided (instruction-only + a small utility script). Nothing is downloaded or written to disk by an install step, which lowers supply-chain risk.
!
Credentials
The SKILL.md explicitly requires AZURE_OPENAI_AUDIO_API_KEY, AZURE_OPENAI_AUDIO_ENDPOINT, and AZURE_OPENAI_AUDIO_DEPLOYMENT, but the package metadata declared no required environment variables or config paths. Additionally, the code examples assume access to application settings and a database (db operations and settings.*), which would require additional credentials (DB connection string, possibly cloud storage). The absence of these declarations is disproportionate and opaque.
Persistence & Privilege
The skill does not request always:true and contains no install hooks or instructions to modify other skills or system-wide config. It does describe saving audio artifacts to a database in examples, which is normal for this application but should be explicitly declared to users.
What to consider before installing
This skill's code and documentation require an Azure OpenAI realtime API key, endpoint, and deployment name and also include example code that reads from and writes to a database — but the registry entry does not declare those requirements. Before installing: (1) ask the publisher to update metadata to list required env vars (AZURE_OPENAI_AUDIO_API_KEY, AZURE_OPENAI_AUDIO_ENDPOINT, AZURE_OPENAI_AUDIO_DEPLOYMENT) and any DB/config needs; (2) verify where secrets will be stored and supplied (do not paste production Azure keys into unknown skills); (3) review / audit any database integration code and ensure least-privilege credentials (read-only where possible) and proper access controls; (4) run the skill in an isolated/test environment first and rotate keys after verification; and (5) if you cannot obtain clear metadata or a trusted publisher, treat the skill as untrusted and do not install it with production credentials.

Like a lobster shell, security has layers — review code before you run it.

latestvk97czrjzakybr4rp5wtgaxpkp9809sbb
2.5kdownloads
3stars
1versions
Updated 1mo ago
v0.1.0
MIT-0

Podcast Generation with GPT Realtime Mini

Generate real audio narratives from text content using Azure OpenAI's Realtime API.

Quick Start

  1. Configure environment variables for Realtime API
  2. Connect via WebSocket to Azure OpenAI Realtime endpoint
  3. Send text prompt, collect PCM audio chunks + transcript
  4. Convert PCM to WAV format
  5. Return base64-encoded audio to frontend for playback

Environment Configuration

AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini

Note: Endpoint should NOT include /openai/v1/ - just the base URL.

Core Workflow

Backend Audio Generation

from openai import AsyncOpenAI
import base64

# Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"

client = AsyncOpenAI(
    websocket_base_url=ws_url,
    api_key=api_key
)

audio_chunks = []
transcript_parts = []

async with client.realtime.connect(model="gpt-realtime-mini") as conn:
    # Configure for audio-only output
    await conn.session.update(session={
        "output_modalities": ["audio"],
        "instructions": "You are a narrator. Speak naturally."
    })
    
    # Send text to narrate
    await conn.conversation.item.create(item={
        "type": "message",
        "role": "user",
        "content": [{"type": "input_text", "text": prompt}]
    })
    
    await conn.response.create()
    
    # Collect streaming events
    async for event in conn:
        if event.type == "response.output_audio.delta":
            audio_chunks.append(base64.b64decode(event.delta))
        elif event.type == "response.output_audio_transcript.delta":
            transcript_parts.append(event.delta)
        elif event.type == "response.done":
            break

# Convert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks)
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)

Frontend Audio Playback

// Convert base64 WAV to playable blob
const base64ToBlob = (base64, mimeType) => {
  const bytes = atob(base64);
  const arr = new Uint8Array(bytes.length);
  for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
  return new Blob([arr], { type: mimeType });
};

const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();

Voice Options

VoiceCharacter
alloyNeutral
echoWarm
fableExpressive
onyxDeep
novaFriendly
shimmerClear

Realtime API Events

  • response.output_audio.delta - Base64 audio chunk
  • response.output_audio_transcript.delta - Transcript text
  • response.done - Generation complete
  • error - Handle with event.error.message

Audio Format

  • Input: Text prompt
  • Output: PCM audio (24kHz, 16-bit, mono)
  • Storage: Base64-encoded WAV

References

Comments

Loading comments...