# ASR Transcription Pipeline

> **Role:** Defines TOS upload protocol and Seed-ASR-2.0 submit/poll API for speech-to-text.
> Load at: Step 3 (transcribing audio). Skip entirely if video has no speech.
> It does NOT replace execution — always run the actual ASR pipeline, never fabricate transcripts.

## TOS Audio Upload

### SDK
- Uses `tos` Python SDK (TosClientV2)
- Install: `pip install tos`

### Connection
```python
client = TosClientV2(
  ak=TOS_ACCESS_KEY,
  sk=TOS_SECRET_KEY,
  endpoint=f"tos-{TOS_REGION}.volces.com",
  region=TOS_REGION,
)
```

### Upload
- Key: `outfit-video/asr/{uuid}.wav`
- Content-Type: audio/wav
- Retry: up to 2 times with 1s delay
- Runs in thread pool (put_object is blocking)

### Presigned URL
- Method: GET
- Expires: 3600 seconds (1 hour)
- Returns signed URL for ASR access

## Seed-ASR-2.0 Transcription

### Base URL
`https://openspeech.bytedance.com/api/v3/auc/bigmodel`

### Submit Request
```
POST /submit
Headers:
  Content-Type: application/json
  x-api-key: {ASR_ACCESS_TOKEN}
  X-Api-Resource-Id: volc.seedasr.auc
  X-Api-Request-Id: {uuid}
  X-Api-Sequence: -1
Body:
  user.uid: "outfit-video"
  audio.url: {presigned_tos_url}
  audio.format: "wav"
  request:
    model_name: "bigmodel"
    enable_itn: true
    enable_punc: true
    enable_ddc: true
    show_utterances: true
    enable_speaker_info: true
    enable_emotion_detection: true
    enable_gender_detection: true
```

### Poll Request
```
POST /query
Headers: (same as submit, WITHOUT X-Api-Sequence)
Body: {}
```

### Status Codes (X-Api-Status-Code header)
- 20000000: completed -> extract result.text
- 20000001: processing, keep polling
- 20000002: processing, keep polling
- 20000003: silent audio, no transcript

### Polling Parameters
- Max wait: 120 seconds
- Poll interval: 3 seconds
- Network retry: exponential backoff, max 3 retries
- Backoff formula: `interval * 2^(retry-1)`