# Azure AI Voice Live SDK - API Reference

## Table of Contents
- [connect() Function](#connect-function)
- [VoiceLiveConnection](#voiceliveconnection)
- [SessionResource](#sessionresource)
- [ResponseResource](#responseresource)
- [InputAudioBufferResource](#inputaudiobufferresource)
- [OutputAudioBufferResource](#outputaudiobufferresource)
- [ConversationResource](#conversationresource)
- [TranscriptionSessionResource](#transcriptionsessionresource)
- [WebsocketConnectionOptions](#websocketconnectionoptions)
- [Exceptions](#exceptions)

---

## connect() Function

Creates an async context manager for WebSocket connections.

```python
from azure.ai.voicelive.aio import connect

async with connect(
    credential: Union[AzureKeyCredential, AsyncTokenCredential],
    endpoint: str,
    api_version: str = "2025-10-01",
    model: Optional[str] = None,
    query: Optional[Mapping[str, Any]] = None,
    headers: Optional[Mapping[str, Any]] = None,
    connection_options: Optional[WebsocketConnectionOptions] = None,
    credential_scopes: Optional[List[str]] = None,
    **kwargs
) -> VoiceLiveConnection:
    ...
```

### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `credential` | `AzureKeyCredential` or `AsyncTokenCredential` | Yes | Authentication credential |
| `endpoint` | `str` | Yes | Service endpoint URL |
| `api_version` | `str` | No | API version (default: "2025-10-01") |
| `model` | `str` | Sometimes | Model identifier (required unless using Agent scenario) |
| `query` | `Mapping[str, Any]` | No | Additional URL query parameters |
| `headers` | `Mapping[str, Any]` | No | Additional HTTP headers |
| `connection_options` | `WebsocketConnectionOptions` | No | WebSocket transport options |
| `credential_scopes` | `List[str]` | No | OAuth scopes (default: `["https://ai.azure.com/.default"]`) |

---

## VoiceLiveConnection

Main connection class with resource accessors.

### Properties

| Property | Type | Description |
|----------|------|-------------|
| `session` | `SessionResource` | Session configuration |
| `response` | `ResponseResource` | Response management |
| `input_audio_buffer` | `InputAudioBufferResource` | Audio input buffer |
| `output_audio_buffer` | `OutputAudioBufferResource` | Audio output buffer |
| `conversation` | `ConversationResource` | Conversation items |
| `transcription_session` | `TranscriptionSessionResource` | Transcription config |

### Methods

```python
async def recv() -> ServerEvent:
    """Receive and parse the next typed server event."""

async def recv_bytes() -> bytes:
    """Receive raw bytes from the connection."""

async def send(event: Union[Mapping[str, Any], ClientEvent]) -> None:
    """Send an event to the server."""

async def close(*, code: int = 1000, reason: str = "") -> None:
    """Close the WebSocket connection."""

async def __aiter__() -> AsyncIterator[ServerEvent]:
    """Iterate over server events until connection closes."""
```

---

## SessionResource

Manage session configuration.

### Methods

```python
async def update(
    *,
    session: Union[Mapping[str, Any], RequestSession],
    event_id: Optional[str] = None
) -> None:
    """Update session configuration."""
```

### RequestSession Fields

| Field | Type | Description |
|-------|------|-------------|
| `instructions` | `str` | System prompt for the model |
| `modalities` | `List[Modality]` | `["text"]`, `["audio"]`, or `["text", "audio"]` |
| `voice` | `Voice` | Voice for audio output |
| `input_audio_format` | `InputAudioFormat` | Audio input format |
| `output_audio_format` | `OutputAudioFormat` | Audio output format |
| `turn_detection` | `TurnDetection` | VAD configuration or `None` for manual |
| `tools` | `List[Tool]` | Function tools |
| `tool_choice` | `ToolChoice` | `"auto"`, `"none"`, `"required"`, or specific function |
| `temperature` | `float` | Model temperature (0.6-1.2) |
| `max_response_output_tokens` | `int` or `"inf"` | Max tokens per response |
| `input_audio_transcription` | `AudioInputTranscriptionOptions` | Transcription settings |

---

## ResponseResource

Manage model responses.

### Methods

```python
async def create(
    *,
    response: Optional[Union[ResponseCreateParams, Mapping[str, Any]]] = None,
    event_id: Optional[str] = None,
    additional_instructions: Optional[str] = None
) -> None:
    """Create a response (trigger inference)."""

async def cancel(
    *,
    response_id: Optional[str] = None,
    event_id: Optional[str] = None
) -> None:
    """Cancel an in-progress response."""
```

### ResponseCreateParams Fields

| Field | Type | Description |
|-------|------|-------------|
| `modalities` | `List[Modality]` | Override session modalities |
| `instructions` | `str` | Override session instructions |
| `voice` | `Voice` | Override session voice |
| `temperature` | `float` | Override temperature |
| `max_response_output_tokens` | `int` | Override max tokens |
| `conversation` | `str` | `"auto"` or `"none"` |

---

## InputAudioBufferResource

Manage audio input buffer.

### Methods

```python
async def append(
    *,
    audio: str,  # Base64-encoded audio
    event_id: Optional[str] = None
) -> None:
    """Append audio data to the input buffer."""

async def commit(
    *,
    event_id: Optional[str] = None
) -> None:
    """Commit the buffer as a user message."""

async def clear(
    *,
    event_id: Optional[str] = None
) -> None:
    """Clear the input buffer without committing."""
```

---

## OutputAudioBufferResource

Manage audio output buffer.

### Methods

```python
async def clear(
    *,
    event_id: Optional[str] = None
) -> None:
    """Clear pending audio output (for interrupts)."""
```

---

## ConversationResource

Manage conversation state.

### Properties

| Property | Type | Description |
|----------|------|-------------|
| `item` | `ConversationItemResource` | Item operations |

---

## ConversationItemResource

CRUD operations on conversation items.

### Methods

```python
async def create(
    *,
    item: Union[ConversationRequestItem, Mapping[str, Any]],
    previous_item_id: Optional[str] = None,
    event_id: Optional[str] = None
) -> None:
    """Create a new conversation item."""

async def delete(
    *,
    item_id: str,
    event_id: Optional[str] = None
) -> None:
    """Delete a conversation item."""

async def retrieve(
    *,
    item_id: str,
    event_id: Optional[str] = None
) -> None:
    """Retrieve item details (server responds with event)."""

async def truncate(
    *,
    item_id: str,
    audio_end_ms: int,
    content_index: int,
    event_id: Optional[str] = None
) -> None:
    """Truncate audio at specified time."""
```

### ConversationRequestItem Types

```python
# User/Assistant/System message
{
    "type": "message",
    "role": "user" | "assistant" | "system",
    "content": [
        {"type": "input_text", "text": "..."},
        {"type": "input_audio", "audio": "base64..."},
        {"type": "text", "text": "..."},
        {"type": "audio", "audio": "base64...", "transcript": "..."}
    ]
}

# Function call (from model)
{
    "type": "function_call",
    "call_id": "...",
    "name": "function_name",
    "arguments": "{...}"
}

# Function output (from client)
{
    "type": "function_call_output",
    "call_id": "...",
    "output": "{...}"
}
```

---

## TranscriptionSessionResource

Configure input transcription.

### Methods

```python
async def update(
    *,
    session: Mapping[str, Any],
    event_id: Optional[str] = None
) -> None:
    """Update transcription session configuration."""
```

---

## WebsocketConnectionOptions

Transport configuration for the WebSocket connection.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `compression` | `bool` or `int` | None | Enable per-message compression |
| `max_msg_size` | `int` | 4MB | Maximum message size |
| `heartbeat` | `float` | 30 | Keep-alive ping interval (seconds) |
| `autoclose` | `bool` | True | Auto-close on close frame |
| `autoping` | `bool` | True | Auto-respond to pings |
| `receive_timeout` | `float` | None | Message receive timeout |
| `close_timeout` | `float` | None | Close handshake timeout |
| `handshake_timeout` | `float` | None | Connection establishment timeout |
| `vendor_options` | `Mapping` | None | Implementation-specific options |

---

## Exceptions

```python
from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed

class ConnectionError(AzureError):
    """Base WebSocket connection error."""

class ConnectionClosed(ConnectionError):
    """WebSocket connection was closed."""
    code: int      # Close code
    reason: str    # Close reason
```

### Common Close Codes

| Code | Meaning |
|------|---------|
| 1000 | Normal closure |
| 1001 | Going away |
| 1006 | Abnormal closure |
| 1008 | Policy violation |
| 1011 | Server error |