Browser Voice Gateway
Browser voice gateway plugin for OpenClaw
Install
openclaw plugins install clawhub:@local/browser-voice-gatewayBrowser Voice Gateway
browser-voice-gateway is a browser and mobile client plugin for OpenClaw.
It gives OpenClaw a dedicated HTTPS web UI for:
- live voice on phones and browsers
- turn-based voice using STT/TTS
- remote text chat
- conversation history and continuity
- OpenClaw tool use from the browser
It is designed so the user can open a web page on another device, authenticate that browser once, and use OpenClaw without exposing long-term provider API keys to the browser.
Screenshots
Voice home, Coast Glass theme:

OpenAI Realtime live response:

Voice mode selector:

Conversation history:

Settings drawer:

Voice home, Studio Slate theme:

Start Here
This plugin now includes a few different documents for different audiences.
- INSTALL.md
- quick install guide for normal users
- README.md
- full install, architecture, runtime flow, storage, UI, and operational notes
- BROWSER_VOICE_GATEWAY_AI_CONTEXT.md
- AI handoff document that explains the plugin in a dense, model-friendly format
- BROWSER_VOICE_GATEWAY_WEB_INTEGRATION.md
- for developers who want to keep this plugin backend but build their own web UI around it
What This Plugin Includes
Current implemented features:
OpenAI Realtime- browser-native WebRTC live voice
OpenAI Whisper- turn-based audio in, audio out
- uses
whisper-1for STT - uses the OpenClaw default text model for the reasoning step
- uses
tts-1for spoken playback
Gemini Live- plugin-side live bridge for mobile reliability
- browser connects to the plugin over your HTTPS origin
- plugin connects to Gemini Live
- browser text chat page
- uses the OpenClaw default text model
- browser tool bridge
- browser sessions can use OpenClaw tools under the global OpenClaw tools policy
- conversation history
- full transcript storage
- device-aware continuity
- history browsing
- conversation summarization for voice sessions
- on voice session end
- on demand for older voice conversations
- dedicated diagnostics
- session log page
- tool trace page
- in-page logs drawer
- theme support
Coast GlassStudio Slate
Current Issues
Gemini Live- on some phones, Gemini playback can sound louder than expected and hardware volume steps may feel coarse while Gemini is speaking
- this appears to be related to the current streamed Web Audio playback path
- it does not currently have a separate in-app output gain control
What Stays Inside OpenClaw
This plugin keeps the important control plane inside OpenClaw.
OpenClaw owns:
- long-term provider API keys
- browser trust/auth
- conversation metadata
- transcript history
- summary generation
- tool policy
- tool execution
The browser does not store or paste real OpenAI or Gemini API keys.
The browser only receives short-lived session/bootstrap material when needed.
Provider Keys And Ephemeral Credentials
This plugin expects the long-term provider keys to already exist inside OpenClaw.
Required providers:
openaigoogle
The real keys are resolved server-side from OpenClaw runtime auth/key-store support. They are not copied into the browser UI and they are not pasted by the user into the web page.
OpenAI
For OpenAI Realtime, the plugin does this:
- resolves the real OpenAI API key from OpenClaw
- calls OpenAI:
POST /v1/realtime/client_secrets
- returns the short-lived Realtime client secret payload to the browser
- the browser uses that short-lived secret to open the WebRTC session
This is implemented in:
Gemini
For Gemini Live, the plugin does this:
- resolves the real Google API key from OpenClaw
- uses the Google server-side SDK to mint a short-lived Gemini auth token
- returns that token to the plugin-side Gemini bridge
- the bridge uses that short-lived token when opening the Gemini Live provider socket
This is implemented in:
Why Ephemeral Credentials Are Used
This is the more secure browser pattern.
Why:
- the long-term provider keys stay inside OpenClaw
- the browser gets short-lived credentials with limited lifetime
- if a short-lived credential leaks, the exposure window is much smaller
- the plugin can constrain what the short-lived credential is for
- trust, history, tools, and key custody stay on the OpenClaw side
In short:
- OpenClaw holds the real keys
- the browser gets temporary session/bootstrap credentials only
- OpenClaw remains the control plane
Gemini SDK Usage
The plugin does use Google SDK code, but only on the server side for ephemeral token minting.
What is true today:
- the plugin uses the Google server-side SDK inside gemini-provider.ts to create the ephemeral token
- the plugin does not use the Gemini browser SDK as the active live runtime path
- the active live runtime path is the plugin-side WebSocket bridge in gemini-live-bridge.ts
Why the browser SDK is not the live runtime path here:
- the direct browser Gemini path was unreliable on iPhone/WebKit
- the browser-side Gemini SDK path also caused browser dependency/runtime issues earlier during implementation
- the plugin-side bridge gave the more reliable mobile behavior
So the current runtime design is:
- browser -> plugin over your HTTPS origin
- plugin -> Gemini Live with a short-lived token
not:
- browser -> Gemini SDK live session directly
Provider And Mode Overview
Voice Page
The main page is:
/browser-voice/
Voice modes available there:
OpenAI RealtimeOpenAI WhisperGemini Live
Chat Page
The separate chat page is:
/browser-voice/chat
The chat page is intentionally simpler:
- text only
- no mode selector
- no record button
- uses the OpenClaw default model from
openclaw.json - currently expects that default model to be an OpenAI model
Install
Read This First About Paths
The JSON examples in this README use Linux absolute paths because this plugin was built and tested on Linux.
Use the same structure on your own machine, but change the path to match where your OpenClaw data directory lives.
The important idea is:
- this plugin folder must live inside your OpenClaw plugins directory
- the
plugins.load.pathsentry inopenclaw.jsonmust point at that folder
Linux example:
/home/<you>/.openclaw/plugins/browser-voice-gateway
macOS example:
/Users/<you>/.openclaw/plugins/browser-voice-gateway
Windows / WSL example:
- if you run OpenClaw inside WSL, use the Linux path inside WSL, for example:
/home/<you>/.openclaw/plugins/browser-voice-gateway
- if you run OpenClaw somewhere else, use that environment's actual OpenClaw directory
If your OpenClaw directory is not under .openclaw, use the real directory for your installation. The examples here are only examples of shape.
1. Put The Plugin In The Plugins Directory
Place this directory at:
/home/<you>/.openclaw/plugins/browser-voice-gateway/
If you are packaging it for GitHub, the plugin directory should contain only the plugin source and docs. It should not contain scratch files or unused debug assets.
2. Install Plugin Dependencies
From inside the plugin directory, run:
cd /home/<you>/.openclaw/plugins/browser-voice-gateway
npm install
This plugin now owns its own Gemini-related dependencies. A fresh download should not rely on gemini-provisioner being present just to get @google/genai or ws.
3. Change The Browser Access Code
Do this before exposing the plugin to any real network.
The example access code in this plugin is generic. You should replace it with your own value in openclaw.json.
The setting is:
plugins.entries.browser-voice-gateway.config.browserAccessCode
Example:
"browserAccessCode": "replace-this-with-your-own-secret-code"
Important:
- do not keep the generic example value
- this is the code users enter in the browser to trust that browser
- it is not an OpenAI key
- it is not a Gemini key
- it is not the OpenClaw gateway token
Practical length rule from the current code:
- keep it at 256 characters or fewer
There is no stricter format requirement in the plugin code beyond being a non-empty string, so you can use letters, numbers, and punctuation if you want.
4. Make Sure OpenClaw Has Provider Auth
This plugin expects OpenClaw auth/key-store support for:
openaigoogle
The plugin resolves provider keys through OpenClaw runtime auth. The browser does not ask the user for provider API keys.
For the current text-chat, Whisper reasoning, and summary path, the OpenClaw default model should be an OpenAI model.
5. Edit openclaw.json
File:
~/.openclaw/openclaw.json
You need to update:
toolsplugins.allowplugins.load.pathsplugins.entries.browser-voice-gateway
The plugin gets loaded in two places:
plugins.load.paths- tells OpenClaw where the plugin folder lives on disk
plugins.entries.browser-voice-gateway- tells OpenClaw to enable this plugin and what config to pass into it
Exact Example
This is a safe example of the relevant shape. You do not need to replace your whole config with this, but your file needs equivalent blocks.
{
"agents": {
"defaults": {
"model": {
"primary": "openai/gpt-4o-2024-11-20"
}
}
},
"tools": {
"profile": "full",
"deny": [
"sessions_spawn",
"sessions_send",
"agents_list"
]
},
"plugins": {
"allow": [
"browser-voice-gateway"
],
"load": {
"paths": [
"/home/<you>/.openclaw/plugins/browser-voice-gateway"
]
},
"entries": {
"browser-voice-gateway": {
"enabled": true,
"config": {
"enabled": true,
"routeBase": "/browser-voice",
"browserAccessCode": "replace-this-with-your-own-secret-code",
"browserSessionTtlHours": 720,
"defaultProvider": "openai",
"openaiModel": "gpt-4o-mini-realtime-preview",
"openaiVoice": "alloy",
"geminiModel": "gemini-2.5-flash-native-audio-preview-12-2025",
"sessionKey": "agent:main:browser-voice",
"serve": {
"enabled": true,
"bind": "0.0.0.0",
"port": 19443,
"publicHost": "YOUR-LAN-IP-OR-HOSTNAME",
"autoSelfSigned": true
}
}
}
}
}
}
If the brackets feel confusing, use this rule:
- do not replace the whole file unless you already know what you are doing
- only add or update the specific blocks shown below
- keep the surrounding braces and commas from your existing file intact
Important:
openclaw.jsonmust stay valid JSON- every
{must have a matching} - every
[must have a matching] - objects at the same level usually need commas between them
- one missing comma or closing brace can break OpenClaw startup or plugin loading
The safest way to edit it is:
- find the existing top-level
pluginsobject - add or update only the shown fields inside that object
- make sure the object still closes exactly once
- save the file and restart OpenClaw
If you already have other plugins configured, do not delete their blocks. Add browser-voice-gateway next to them inside the existing plugins.entries object.
Where Each Block Goes
tools
This is a top-level block.
"tools": {
"profile": "full",
"deny": [
"sessions_spawn",
"sessions_send",
"agents_list"
]
}
This plugin reads the same global OpenClaw tools policy. That means browser sessions inherit the normal tool profile, while the denied tools above stay blocked.
plugins.allow
Inside the top-level plugins object:
"plugins": {
"allow": [
"browser-voice-gateway"
]
}
plugins.load.paths
Inside the same plugins object:
"plugins": {
"load": {
"paths": [
"/home/<you>/.openclaw/plugins/browser-voice-gateway"
]
}
}
This is the line that tells OpenClaw where the plugin folder is.
If you are on another machine, this path should be the absolute path to your own browser-voice-gateway folder.
plugins.entries
Inside the same plugins object:
"plugins": {
"entries": {
"browser-voice-gateway": {
"enabled": true,
"config": {
"enabled": true,
"routeBase": "/browser-voice",
"browserAccessCode": "replace-this-with-your-own-secret-code",
"browserSessionTtlHours": 720,
"defaultProvider": "openai",
"openaiModel": "gpt-4o-mini-realtime-preview",
"openaiVoice": "alloy",
"geminiModel": "gemini-2.5-flash-native-audio-preview-12-2025",
"sessionKey": "agent:main:browser-voice",
"serve": {
"enabled": true,
"bind": "0.0.0.0",
"port": 19443,
"publicHost": "YOUR-LAN-IP-OR-HOSTNAME",
"autoSelfSigned": true
}
}
}
}
}
This is the block that actually enables the plugin and gives it its runtime config.
If you already have other plugin entries, add browser-voice-gateway alongside them inside the same plugins.entries object.
What The Important Plugin Config Fields Mean
browserAccessCode
- one-time trust code for the browser
- not an OpenAI key
- not a Gemini key
- not the OpenClaw gateway token
browserSessionTtlHours
- how long the trusted browser cookie lasts
defaultProvider
- first voice mode shown when the page loads
openaiModel
- OpenAI live voice model for
OpenAI Realtime
openaiVoice
- OpenAI voice name used for live and STT/TTS playback
geminiModel
- Gemini live model used by the Gemini bridge
sessionKey
- base session key prefix for browser voice conversations
- actual per-conversation keys become:
agent:main:browser-voice:<conversationId>
serve.bind
0.0.0.0makes the HTTPS server reachable from other devices on the network
serve.publicHost
- LAN IP or hostname the phone/browser should use
serve.port
- HTTPS port for the browser UI
serve.autoSelfSigned
- auto-generates a self-signed cert if no cert/key files are present
Start It
After installing dependencies and updating openclaw.json, restart the gateway:
openclaw gateway restart
Then open:
https://<publicHost>:<port>/browser-voice/
Example:
https://<your-host-or-ip>:19443/browser-voice/
HTTPS And Mobile Trust
This plugin serves HTTPS because mobile browsers require secure contexts for microphone access.
Important distinction:
- HTTPS transport exists
- trusted HTTPS is whether the phone/browser actually trusts the certificate
If the certificate is self-signed, the page may still say Not Secure or the browser may behave inconsistently with microphone/audio access, especially on iPhone.
What that means in practice:
- HTTPS transport exists
- the phone still may not trust the certificate
- a self-signed certificate is useful for development, but it is not the same thing as a trusted production certificate
Browser Guidance
Recommended first test browser:
- Chrome on mobile
Not recommended for initial testing:
- Safari on iPhone
Why Safari is not recommended first:
- Safari is stricter about certificate trust and secure-context behavior
- Safari is less forgiving when the certificate is self-signed or not fully trusted
- microphone and media behavior can fail earlier there
Important note:
- Chrome on iPhone still uses WebKit underneath
- so it is not fully separate from Safari behavior
- it was still the more practical test surface during this plugin build
Silent Mode And iPhone Audio
For OpenAI Whisper, iPhone silent mode matters.
What was observed during testing:
- STT/TTS responses were being generated correctly
- browser playback could still be inaudible on iPhone when the phone was in silent mode
So if OpenAI Whisper appears to respond in text but you hear nothing on iPhone:
- turn silent mode off
- raise volume
- test again
This was a browser/device playback issue, not a failed STT/TTS response generation issue.
If The Certificate Is Not Trusted
If the phone/browser does not trust the certificate:
- the page may say
Not Secure - microphone access may fail or behave inconsistently
- live audio behavior may degrade
- browser security behavior can differ between Safari and Chrome
The What Is Trusted HTTPS? button in Settings explains this in the UI.
In practice:
- self-signed local HTTPS may still work for testing
- untrusted HTTPS can still break microphone access or other secure browser features
- iPhone Safari is not recommended for this plugin
- iPhone Chrome can work, but trusted HTTPS is still strongly recommended
- cloud-hosted OpenClaw installs have not been fully validated with this plugin yet
Exact Runtime Flow
This section is the actual plugin flow, not a hand-wavy overview.
Browser Trust Flow
- User opens the browser page.
- User enters the browser access code once.
- Plugin validates the access code against:
plugins.entries.browser-voice-gateway.config.browserAccessCode
- Plugin creates a trusted browser record.
- Plugin sets an
HttpOnlycookie. - Trusted browser data is stored at:
~/.openclaw/browser-voice/trusted-browsers.json
After that, the same browser usually does not need the code again until the cookie expires or is cleared.
OpenAI Realtime Flow
- Browser starts or resumes a conversation with
/api/session/start. - Plugin creates or resolves a conversation record.
- Browser requests
/api/bootstrap/openai. - Plugin resolves the real OpenAI API key from OpenClaw auth.
- Plugin calls OpenAI Realtime client-secret creation and receives a short-lived client secret.
- Browser opens WebRTC directly to OpenAI.
- Browser streams mic audio.
- OpenAI streams audio back.
- Transcript events are mirrored back into OpenClaw history.
- On session end, the plugin summarizes the voice conversation and stores title + summary.
OpenAI Whisper Flow
This is the turn-based audio mode.
- User selects
OpenAI Whisper. - User taps
Start. - Browser opens a persistent local playback context and keeps the conversation alive on the page.
- Main button becomes turn control:
- tap to record
- tap again to send
- Browser uploads audio to
/api/chat/turnwithmode: openai_stt_tts. - Plugin:
- transcribes with
whisper-1 - runs the text turn with the OpenClaw default model
- synthesizes reply audio with
tts-1
- transcribes with
- Browser shows the assistant text and plays returned speech.
Endexplicitly closes the Whisper session on the page.- On session end, the plugin summarizes the voice conversation and stores title + summary.
Gemini Live Flow
Gemini is intentionally not raw browser-to-Gemini WebSocket on iPhone/WebKit because that path was unreliable.
The implemented shape is:
- Browser starts or resumes a conversation with
/api/session/start. - Browser opens a same-origin WebSocket to:
/browser-voice/ws/gemini-live
- Plugin resolves the real Google API key from OpenClaw auth.
- Plugin mints a short-lived Gemini token.
- Plugin opens the Gemini Live provider socket with that short-lived token.
- Plugin relays audio, text, tool calls, and transcription between browser and Gemini.
- Plugin now injects prior context in two ways:
- setup/system instruction
- explicit initial text turn after setup completes
- Plugin persists Gemini transcripts into OpenClaw history.
- On session end, the plugin summarizes the voice conversation and stores title + summary.
Where History Is Saved
There are three important storage layers.
1. Trusted Browser Records
Stored at:
~/.openclaw/browser-voice/trusted-browsers.json
This tracks:
- trusted browser id
- browser label
- last seen time
- expiry
2. Conversation Metadata
Stored at:
~/.openclaw/browser-voice/conversations.json
This tracks:
- conversation id
- title
- summary
- provider
- browser ownership
- device/shared mode
- timestamps
- preview
- linked OpenClaw session file
3. Full Transcript History
Stored in OpenClaw session files under:
~/.openclaw/agents/main/sessions/
The plugin mirrors transcript entries into the same OpenClaw session-style storage used by the rest of the system.
This full transcript is the source of truth.
How Continuity Works
Continuity is device-aware by default.
Latest
- loads the most recent conversation for this browser/device
- does not open the history panel by itself
Browse History
- opens the history panel
- lets the user select any accessible saved conversation
Visible History vs Model History
These are intentionally not the same thing.
Visible UI history:
- loads the full saved thread into
Live Responsewhen a conversation is selected
Model context injection:
- sends the stored summary
- plus recent raw turns
- not the entire transcript
This keeps the UI useful for the human while keeping provider context smaller and cleaner.
Summaries
Voice sessions use summary generation.
Text chat does not currently use this summarization flow.
Automatic Summary Generation
On voice session end, the plugin:
- reads the full non-synthetic transcript
- sends it to the OpenClaw default text model path
- asks for:
Title:Summary:
- stores:
- title
- summary up to 8 sentences
Manual Summary Generation
In Browse History, the user can use:
Summarize Selected
That is for older voice conversations that were created before summary support existed or for any conversation that needs to be refreshed manually.
What Summary Is Used For
The stored summary is used for:
- better history display
- better continuity when reopening a voice conversation
Web UI Guide
Main Voice Page
URL:
/browser-voice/
Controls:
Voice Mode
OpenAI RealtimeOpenAI WhisperGemini Live
Chat
- opens the separate text chat page
Latest
- loads the most recent conversation for this device
New
- queues a new conversation
- next voice start uses a fresh conversation id
Browse History
- opens the history panel
Live Response
- shows the full visible thread for the currently selected conversation
- also shows current response text as it streams
Settings
- opens browser auth, diagnostics, theme, and session controls
Logs
- opens the in-page debug drawer if enabled
OpenAI Realtime Button Behavior
- idle blue button
- turns live/red during active realtime session
- tap once to start
- tap again to end
OpenAI Whisper Button Behavior
Whisper has a different interaction model.
Before session start:
- main button shows
Start
After start:
- main button becomes turn control
- tap to record
- tap again to send
Endappears on the other side of the status bubble
Play Last Reply
- replays the last spoken Whisper response
End
- explicitly closes the current Whisper session on the page
Chat Page
URL:
/browser-voice/chat
Behavior:
- always text-only
- always uses the OpenClaw default model
Latest,New, andBrowse Historybehave the same way as on the voice page
Diagnostics
Separate pages:
/browser-voice/trace/browser-voice/session-log
In-page diagnostics:
- floating
Logsbutton on the voice page - controlled by
Log Displayin Settings
Themes
The plugin includes:
Coast GlassStudio Slate
Theme selection is stored in browser local storage and applies to both the voice and chat pages.
What Uses The OpenClaw Default Model
The OpenClaw default model from:
agents.defaults.model.primary
is used for:
- browser text chat
- Whisper STT/TTS text reasoning step
- voice conversation summarization
It is not used for:
- OpenAI Realtime media transport
- Gemini Live media transport
Current important limitation:
- browser text chat, Whisper reasoning, and summary generation currently expect the OpenClaw default model to be an OpenAI model
- if the default model is not an OpenAI model, those paths are not fully supported yet
Tools
Browser sessions use the OpenClaw tool bridge.
The plugin exposes:
openclaw_toolwrite_file
Actual tool availability still follows the global OpenClaw tools policy.
Known Operational Notes
- iPhone silent mode can still affect browser audio playback.
- Self-signed HTTPS is not the same as trusted HTTPS.
- Safari on iPhone is not the recommended first test browser.
- Chrome on iPhone still uses WebKit underneath.
- Cloud-hosted OpenClaw deployments were not fully validated during this build.
- Local-network and self-hosted use were the primary tested path.
- The gateway may warn if the systemd service token is stale.
- if needed:
openclaw gateway install --force
Minimal Test Checklist
- Restart the gateway.
- Open the voice page.
- Authenticate once with the browser access code.
- Start
OpenAI Realtimeand confirm live voice works. - Start
OpenAI Whisperand confirm turn-based voice works. - Start
Gemini Liveand confirm context continuity works on a summarized conversation. - Open
Browse History. - Select a conversation and confirm full history loads into
Live Response. - End a voice session and confirm title + summary update.
- Use
Summarize Selectedon an older voice conversation and confirm it updates.
License
This project is intended to be source-available for personal and noncommercial use.
Repository files:
Practical summary:
- noncommercial use is governed by the included license
- commercial use requires a separate commercial license from the repository owner
If you publish this on GitHub, do not use GitHub's auto-generated license picker for this repo. Create the repo with No license and commit the included LICENSE file yourself.
