# Zvukogram SSML — Agent Reference (Complete) Source (official): https://zvukogram.com/node/ssml/ This reference is a **practical, agent-readable** summary of Zvukogram SSML behavior (with Zvukogram-specific extensions/voice limitations). Treat it as the canonical contract for what we generate in podcast pipelines. See also: - `say-as` patterns & templates: `references/say-as.md` - pronunciation workflow (`+`, `_{`, ``): `references/pronunciation-patterns.md`
- podcast-oriented patterns: `references/podcast-examples.md`

## 0) General rules

- SSML is **XML**. Tags must be well-formed.
- Most tags have opening+closing pairs: `...`.
- Self-closing example: ``.
- Some voices may ignore certain tags/attributes (see per-tag notes).

## 1) Safety / formatting contract for pipelines

When producing TTS-ready text for Zvukogram:

- ✅ Allowed: plain text + SSML tags listed in this doc.
- ❌ Forbidden: arbitrary XML/HTML/JSON/YAML structures (e.g. `...`), markdown tables/code blocks, or any non-SSML markup.
- If you include SSML, **only** use the supported tags and attributes below.
- If your downstream runtime does not support wrapper tags like `` or ``, strip only those wrappers rather than blindly deleting all tags. Useful inline tags such as `break`, `say-as`, `sub`, `prosody`, `phoneme`, and `emphasis` should be preserved when the runtime supports them.

## 2) Supported tags (overview)

### Pauses
- `` — pause

### Substitutions / aliases
- `_TEXT` — replace how TEXT is spoken

### Prosody / intonation
- `...`
- `...`

### Pronunciation (expert)
- `...`

### `say-as` interpret-as (main formatting tool)
- `...`
- `...`
- `...`
- `...`
- `...`
- `...`
- `...`
- `...`
- `...`
- `...`

---

## 3) Tag details

### 3.1 `` — pauses

**Syntax:**
```xml

```
- `time` can be in `ms` or `s`.

Notes:
- Multiple pauses can be placed sequentially.

Source: https://zvukogram.com/node/pausa/

---

### 3.2 `_{` — alias / substitution

**Use when:** you need consistent pronunciation (brands, acronyms, names).

**Syntax:**
```xml
_OpenAI
```

---

### 3.3 `` — pitch / rate / volume

**Important:** Zvukogram explicitly warns that `` works best on **a whole sentence**. If you wrap only a single word in the middle of a sentence, you may get unwanted pauses around the tag.

**Syntax examples:**
```xml
Это пример.
Быстрее на 50%.
Некоторые голоса поддерживают Hz.
```

Allowed attribute families (varies by voice):
- `pitch`: semitones (`+6st`), percent (`-20%`), constants (`x-low|low|medium|high|x-high|default`), sometimes `Hz`.
- `rate`: constants (`x-slow|slow|medium|fast|x-fast|default`), percent styles (`+70%`, `50%`, `150%`).
- `volume`: `dB` (`-15dB`, `+10dB`), constants (e.g. `silent`, `low`, `high`, ...), sometimes percent (`+50%`).

Source: https://zvukogram.com/node/prosody/

---

### 3.4 `` — simple expressiveness

**Syntax:**
```xml
А сегодня тепло и солнечно!
```

`level` values:
- `strong` (louder + slower)
- `moderate` (default)
- `reduced` (softer + faster)
- `none`

Source: https://zvukogram.com/node/emphasis/

---

### 3.5 `` — IPA phonemes (expert)

**Use when:** you need precise pronunciation/stress and `+` stress marks are not available/insufficient for a chosen voice.

**Syntax:**
```xml
Никитенко
```

Notes:
- Stress mark uses the IPA symbol `ˈ`.
- For Russian, unstressed vowels often require different IPA symbols (Zvukogram doc explains `ə`, `ɐ`, etc.).

Source: https://zvukogram.com/node/mfa/

---

## 4) `say-as` interpret-as modes

### 4.1 spell-out / verbatim / characters

**Spell-out (letters):**
```xml
банан
ООО
```

Notes:
- Different voices may read abbreviations differently.
- Some voices support `verbatim` or `characters` as alternatives.

Source: https://zvukogram.com/node/spell-out/

---

### 4.2 cardinal (quantity)

**Syntax:**
```xml
5
```

Use when you need to force "сколько?" (quantity) rather than ordinal.

Advanced voices: support grammatical `format="GENDER_CASE"` (examples in source). Max range: up to billions (trillions may not be spoken).

Source: https://zvukogram.com/node/cardinal/

---

### 4.3 ordinal (order)

**Syntax:**
```xml
Возьми 3 ящик слева
```

Advanced voices: support grammatical `format="GENDER_CASE"` (examples in source).

Source: https://zvukogram.com/node/ordinal/

---

### 4.4 fraction (fractions)

**Syntax:**
```xml
1/2
3+1/2
```

Notes:
- `3+1/2` means "три целых и одна вторая" (no spaces).
- Not supported by all voices.

Source: https://zvukogram.com/node/fraction/

---

### 4.5 date

**Basic voices (W3C style):**
```xml
5/7/24
1945.05.09
```

Rules:
- Separators: `-`, `/`, `.`
- `format` is one of: `dmy`, `mdy`, `ymd`, `ym`, `my`, `md`, `dm`, `d`, `m`, `y`
- Keep `detail="1"` for this mode.

Not supported by some voices (explicitly listed in source).

**Advanced voices (case + template):**
```xml
25-1-2000
02-2000
```

- `format` becomes grammatical case (`nominative|genitive|dative|accusative|ablative|prepositional`).
- `detail` becomes template (e.g. `d-m-y`, `d-m-yw`, `m-y`, `m-yw`, etc.).
- `y` includes word “год/года”; `yw` suppresses it.
- In advanced mode, VALUE must use `-` as a separator.

Source: https://zvukogram.com/node/date/

---

### 4.6 time

**Syntax:**
```xml
13:45
4:50
4:50am
```

Source: https://zvukogram.com/node/time/

---

### 4.7 telephone

**Syntax:**
```xml
88005557778
+7 (495) 600-35-56
```

Notes:
- If you format number with spaces/dashes manually, voices often read it correctly even without `telephone`.
- When using separators, groups should be <= 3 digits or an error may occur.
- Not supported by some voices (explicitly listed in source).

Source: https://zvukogram.com/node/telephone/

---

### 4.8 currency + money

**currency (general):**
```xml
99.9 USD
10.5 EUR
```

**money (advanced voices, with cases):**
```xml
21
21,15
10
```

Notes:
- `money` supports grammatical `format` cases similar to date.
- Supported currencies for `money` are limited (see source).

Source: https://zvukogram.com/node/currency/

---

### 4.9 bleep / expletive (censorship)

**Syntax:**
```xml
Это цензурное слово
```

Notes:
- `interpret-as="expletive"` is also accepted; effect is the same.
- Bleep duration matches the spoken duration of the censored chunk.

Source: https://zvukogram.com/node/expletive/

---

## 5) Voice support: important exceptions (Zvukogram-specific)

Zvukogram uses different underlying engines; some voices ignore or break on some tags.

Key groups mentioned in official docs:

### “Advanced voices” (cases/gender templates)
These voices are repeatedly called out as “advanced” for `cardinal/ordinal/date/telephone/money`:
- Наталья, Борислав, Марфа, Тарас, Александра, Сергей

### Fraction support
Only the following Russian voices support `fraction` (per source):
- Елена, Карина, Дмитрий, Анна, Борис, Катя, Денис, Дарья, Даниил, Светлана, Екатерина, Бот Татьяна, Бот Максим

### currency / expletive support
`currency` support list (per source):
- Карина, Дмитрий, Анна, Борис, Катя, Денис, Дарья, Даниил, Светлана, Екатерина

`bleep/expletive` support list (per source):
- Карина, Дмитрий, Анна, Борис, Катя, Денис, бот Максим, бот Татьяна

### date exceptions
Voices explicitly listed as NOT supporting the basic `date` mode:
- Алена, Филипп, Оксана, Джейн, Омаж, Захар, Эрмил, Мартын

### telephone exceptions
Voices explicitly listed as NOT supporting `telephone` (except a narrow `+XXXXXXXX` case):
- Филипп, Эрмил, Захар, Алена, Оксана

---

## 6) Quick validation checklist (before calling TTS)

- [ ] Text is plain text + supported SSML only.
- [ ] XML tags are well-formed.
- [ ] No `` / `` wrappers are assumed (API may not support them; multi-voice is done by fragmenting).
- [ ] `prosody` wraps whole sentences (avoid wrapping single mid-sentence words).
- [ ] `date/time/telephone/fraction/currency` tags are only used with voices that support them.}}