Install
openclaw skills install openclaw-dxDiagnose and fix openclaw gateway issues. Use when the gateway is stuck, not starting, crash-looping, or rejecting connections. Covers main and --profile ves...
openclaw skills install openclaw-dxDiagnose, fix, and document openclaw gateway issues. Covers both main (port 18789) and vesper profile (port 18999) gateways.
Run these in parallel to assess state:
# 1. What's listening?
lsof -i :18789 -i :18999 2>/dev/null | grep LISTEN
# 2. Process health (memory, CPU, uptime)
ps -o pid,rss,pcpu,lstart,etime -p $(lsof -i :18789 -t 2>/dev/null | head -1)
# 3. Recent errors
tail -30 ~/.openclaw/logs/gateway.err.log
# 4. Recent activity
tail -20 ~/.openclaw/logs/gateway.log
# 5. Channel status
openclaw channels status
# 6. Version
openclaw --version
# 7. Pending device pairings
openclaw devices list --json | head -20
# 8. Model config + fallback chain (use affected profile's config dir)
# Main: ~/.openclaw/openclaw.json | Vesper: ~/.openclaw-vesper/openclaw.json
cat ~/.openclaw/openclaw.json | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin)['agents']['defaults']['model'], indent=2))"
# 9. Per-agent auth token status + expiry check
# Main: ~/.openclaw/agents/main/agent/auth-profiles.json
# Vesper: ~/.openclaw-vesper/agents/main/agent/auth-profiles.json
python3 -c "
import sys,json,time
data=json.load(open('$HOME/.openclaw/agents/main/agent/auth-profiles.json'))
now=time.time()*1000
for k,v in data.get('profiles',{}).items():
exp=v.get('expires',0)
expired='EXPIRED' if exp and exp<now else 'valid'
has_token='yes' if v.get('access') or v.get('token') else 'NO'
print(f'{k}: type={v.get(\"type\",\"?\")} token={has_token} expires={expired}')
"
# 10. Memory search / QMD (use --profile if vesper)
openclaw memory status
# 11. Check OPENCLAW_GATEWAY_TOKEN env var (multi-profile foot-gun)
echo "OPENCLAW_GATEWAY_TOKEN=${OPENCLAW_GATEWAY_TOKEN:-unset}"
# 12. Session token counts (context overflow check)
for dir in ~/.openclaw ~/.openclaw-vesper; do
f="$dir/agents/main/sessions/sessions.json"
[ -f "$f" ] && echo "=== $(basename $dir) ===" && python3 -c "
import json
data=json.load(open('$f'))
for k,v in data.items():
t=v.get('contextTokens',0)
pct=t/200000*100
flag=' ⚠️ BLOATED' if pct>75 else ''
print(f' {k}: {t:,} tokens ({pct:.0f}%){flag}')
"
done
# 12. Verify plist profile alignment
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.gateway.plist
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.vesper.plist
Symptom: All models failed (N): followed by per-provider errors. May also appear as "The model has crashed without additional information. (Exit code: null)"
Diagnosis: Check the full error chain — each attempt cycles primary → fallback1 → fallback2. All must fail for the user to see an error. Common error signatures per provider:
The AI service is temporarily overloaded (transient, or stale token)OAuth token refresh failed for openai-codex or refresh_token_reused (expired access token + consumed refresh token)No API key found for provider "google" (provider never configured in auth-profiles.json)AttributeError: 'list' object has no attribute 'swapaxes' (model inference bug)
Fix: Identify which providers are broken and fix each:# Check fallback config (use affected profile's config dir)
cat ~/.openclaw/openclaw.json | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin)['agents']['defaults']['model'], indent=2))"
# Check per-agent auth tokens + expiry
python3 -c "
import sys,json,time
data=json.load(open('$HOME/.openclaw/agents/main/agent/auth-profiles.json'))
now=time.time()*1000
for k,v in data.get('profiles',{}).items():
exp=v.get('expires',0)
expired='EXPIRED' if exp and exp<now else 'valid'
has_token='yes' if v.get('access') or v.get('token') else 'NO'
print(f'{k}: type={v.get(\"type\",\"?\")} token={has_token} expires={expired}')
"
For OAuth expiry (OpenAI Codex): openclaw configure (interactive re-auth). Add --profile vesper if vesper.
For missing provider keys (Google etc.): openclaw agents add <provider> or remove unconfigured providers from fallback chain.
Prevention: Ensure all providers in the fallback chain are actually configured. Use same-provider fallbacks (e.g. different Anthropic models) instead of cross-provider for predictable failure modes.
Symptom: Crash loop with Unhandled promise rejection: Error: An API error occurred: token_expired
Fix:
# Disable the channel
# Edit ~/.openclaw/openclaw.json: channels.slack.enabled → false AND plugins.entries.slack.enabled → false
openclaw gateway start
# Then rotate token at api.slack.com and re-enable
Symptom: Gateway start blocked: set gateway.mode=local (current: unset)
Fix: Restore from backup:
ls -la ~/.openclaw/openclaw.json.bak*
# Find the largest/most recent backup with full config
cp ~/.openclaw/openclaw.json.bak-XXXX ~/.openclaw/openclaw.json
openclaw doctor --fix
openclaw gateway start
Symptom: Gateway won't start, references old PID Fix:
ls ~/.openclaw/gateway.*.lock
cat ~/.openclaw/gateway.*.lock # check PID
kill -0 <pid> # verify dead
rm ~/.openclaw/gateway.*.lock
openclaw gateway start
Symptom: unauthorized: device token mismatch or pairing required
Fix:
openclaw devices list --json # check for pending requests
openclaw devices approve "<requestId>" --password "$OPENCLAW_GATEWAY_PASSWORD"
# Or rotate existing device:
openclaw devices rotate --device <id> --role operator --password "$OPENCLAW_GATEWAY_PASSWORD"
Symptom: unauthorized: gateway password mismatch
Fix: Sync passwords across profiles. All profiles should use $OPENCLAW_GATEWAY_PASSWORD to match the env var in shell rc (~/.bashrc or ~/.zshrc).
Symptom: Gateway listening but not responding, RSS exceeds Critical threshold (see Memory Thresholds) Fix:
openclaw gateway stop
sleep 2
kill -9 <pid> # if still lingering
launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Symptom: Config invalid: plugins.entries.X: plugin not found
Fix: Remove the stale plugin entry from ~/.openclaw/openclaw.json, then openclaw gateway start.
Symptom: Port 18789 is already in use or multiple gateway PIDs
Fix:
ps aux | grep openclaw-gateway | grep -v grep
kill <orphan-pids>
openclaw gateway start
Symptom: Crash loop with plugins: plugin: plugin manifest requires configSchema
Diagnosis: A plugin in ~/.openclaw/extensions/ (auto-discovered) or plugins.load.paths has an openclaw.plugin.json without the required configSchema field. Run openclaw doctor --fix — the "Plugin diagnostics" section names the offending manifest.
Fix: Add empty configSchema to the plugin manifest:
"configSchema": {
"type": "object",
"additionalProperties": false,
"properties": {}
}
Then restart: launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Prevention: All plugin manifests require configSchema in 2026.3.2+, even if empty. Run openclaw doctor after creating custom plugins before restarting.
Symptom: CLI commands fail with json.decoder.JSONDecodeError or Unexpected token. Gateway may still run (it was started before the edit) but CLI/TUI can't parse the config to connect.
Diagnosis: Someone hand-edited openclaw.json and introduced unquoted keys, trailing commas, or other invalid JSON.
Fix: Validate and fix the JSON:
python3 -c "import json; json.load(open('$HOME/.openclaw/openclaw.json'))"
# Fix the reported line — common issues: unquoted keys, trailing commas
Prevention: Use openclaw configure or a JSON-aware editor. After manual edits, validate with the python one-liner above.
Symptom: CLI/TUI fails with gateway token missing (set gateway.remote.token to match gateway.auth.token)
Diagnosis: The gateway uses gateway.auth.mode: "token" but gateway.remote.token is not set. The CLI reads remote.token to authenticate — without it, all connections are rejected.
Fix: Add gateway.remote.token matching gateway.auth.token:
# In openclaw.json, inside the "gateway" section:
"remote": {
"token": "<same value as gateway.auth.token>"
}
Then restart the gateway.
Note: Any profile using gateway.auth.mode: "token" needs gateway.remote.token set. Profiles using password auth ($OPENCLAW_GATEWAY_PASSWORD) are not affected.
Symptom: Gateway down, port not listening, launchctl print says service not found. Error log shows config change requires gateway restart followed by restart failure.
Diagnosis: Multiple config.patch calls (e.g., from an agent using the gateway tool) changed gateway.auth.* or other restart-requiring keys. Each patch triggers a deferred restart. The restart mechanism fails with one of:
spawnSync launchctl ETIMEDOUTBootstrap failed: 5: Input/output errorThe gateway falls back to in-process restart, becomes unstable, eventually receives SIGTERM, and the LaunchAgent is left unloaded. Fix:
launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Prevention:
gateway.auth.* into a single config.patch call to minimize restart triggersKeepAlive with ThrottleInterval=30 in the plistconfig.patch action, combine auth + plugin + compaction changes into one callSymptom: unauthorized: gateway token mismatch on openclaw --profile vesper (or any non-default profile), even when gateway.auth.token and gateway.remote.token match in the profile's config. Main profile works fine.
Diagnosis: The CLI resolves auth tokens with this precedence:
process.env.OPENCLAW_GATEWAY_TOKEN > gateway.remote.token (config)
If OPENCLAW_GATEWAY_TOKEN is set in shell rc (~/.zshrc/~/.bashrc) to main's token, the CLI sends main's token to ALL profiles, including vesper — which has its own gateway.auth.token.
Fix: Sync all profiles to use the same gateway auth token (matching $OPENCLAW_GATEWAY_TOKEN):
# Check env var
echo $OPENCLAW_GATEWAY_TOKEN
# In each profile's openclaw.json, set gateway.auth.token AND gateway.remote.token to match
Alternative: Remove OPENCLAW_GATEWAY_TOKEN from shell rc and rely solely on config-file tokens. Then each profile can have independent tokens.
Prevention: When using OPENCLAW_GATEWAY_TOKEN env var with multi-profile setups, all profiles must use the same auth token value. The env var is profile-agnostic.
Symptom: unauthorized: gateway token mismatch on main profile. Main gateway appears to be running (port listening) but uses wrong config. Vesper commands may work against main's port.
Diagnosis: The ai.openclaw.gateway.plist was overwritten (likely by openclaw --profile vesper gateway install or agent config patches) to point all env vars to vesper's state dir. Check:
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.gateway.plist
# Should show ~/.openclaw, NOT ~/.openclaw-vesper
grep OPENCLAW_PROFILE ~/Library/LaunchAgents/ai.openclaw.gateway.plist
# Should NOT be present (main is the default profile)
Fix: Edit the plist to restore correct profile paths:
OPENCLAW_STATE_DIR → ~/.openclawOPENCLAW_CONFIG_PATH → ~/.openclaw/openclaw.jsonOPENCLAW_PROFILE key (main doesn't need it)StandardOutPath → ~/.openclaw/logs/gateway.logStandardErrorPath → ~/.openclaw/logs/gateway.err.logOPENCLAW_GATEWAY_PORT → 18789OPENCLAW_LAUNCHD_LABEL → ai.openclaw.gatewayThen restart: launchctl bootout gui/501/ai.openclaw.gateway && launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Prevention: After running openclaw --profile <name> gateway install, verify BOTH plists still point to the correct profiles. The install command may overwrite the default profile's plist.
Symptom: Messages received but never responded to. Bot shows "typing" for ~2 minutes then stops (typing TTL reached). Same behavior across ALL providers (Anthropic, Codex, Gemini). Switching providers does not help.
Diagnosis: The agent session has grown too large (>90% of context window). Compaction times out, blocking the message processing lane indefinitely.
# Check session token counts
cat ~/.openclaw/agents/main/sessions/sessions.json | python3 -c "
import sys,json
data=json.load(sys.stdin)
for k,v in data.items():
tokens=v.get('contextTokens',0)
pct=tokens/200000*100
flag=' ⚠️ BLOATED' if pct>75 else ''
print(f'{k}: {tokens:,} tokens ({pct:.0f}%){flag}')
"
# Check for compaction timeout in logs
grep -i 'timed out during compaction\|embedded run timeout' ~/.openclaw/logs/gateway.err.log | tail -5
# Check for typing TTL (message received but agent stuck)
grep 'typing TTL reached' ~/.openclaw/logs/gateway.log | tail -5
Key log signatures:
embedded run timeout: runId=... timeoutMs=600000 — compaction timed out at 600susing current snapshot: timed out during compaction — session remains at bloated sizetyping TTL reached — bot received message, started typing, but agent never responded
Fix: Reset the bloated session:# 1. Find the session transcript
ls -la ~/.openclaw/agents/main/sessions/*.jsonl
# 2. Rename transcript to trigger reset
TIMESTAMP=$(date -u +%Y-%m-%dT%H-%M-%S.000Z)
mv ~/.openclaw/agents/main/sessions/<session-id>.jsonl \
~/.openclaw/agents/main/sessions/<session-id>.jsonl.reset.$TIMESTAMP
# 3. Remove session entry from sessions.json
python3 -c "
import json
path='$HOME/.openclaw/agents/main/sessions/sessions.json'
data=json.load(open(path))
# Delete the bloated session entry (e.g., 'agent:main:main')
del data['agent:main:main']
json.dump(data, open(path, 'w'), indent=2)
"
# 4. Restart gateway
launchctl bootout gui/501/ai.openclaw.gateway && launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Prevention:
contextPruning is configured on ALL profiles: { "mode": "cache-ttl", "ttl": "1h", "keepLastAssistants": 5 }/reset for long-running persistent sessions (weekly)typing TTL reached + provider-agnostic failures = session bloat, not provider issueSymptom: Messages received but never responded to. No errors in logs. No lane wait exceeded. No typing TTL reached for the stuck call. Gateway process healthy, event loop alive (cron timers firing). Channel status shows in: Xm ago but out: Ym ago where Y >> X.
Diagnosis: The agent's API call to the LLM provider hung indefinitely — neither returned nor timed out. The lane is blocked but below the lane wait exceeded threshold logging interval, OR the lane wait log already fired and the call is still stuck.
# 1. Check channel in/out gap (indicates processing is stuck)
openclaw --profile <profile> channels status
# 2. Check last session transcript entry — look for unanswered tool results
tail -5 ~/.openclaw-vesper/agents/main/sessions/*.jsonl | python3 -c "
import sys, json
for line in sys.stdin:
line = line.strip()
if not line: continue
try:
d = json.loads(line)
msg = d.get('message',{})
role = msg.get('role','?')
ts = d.get('timestamp','')
if role == 'toolResult':
tool = msg.get('toolName','?')
print(f'{ts} toolResult({tool}) — agent should respond but may be stuck')
elif role == 'user':
print(f'{ts} user message — waiting for agent response')
except: pass
"
# 3. Verify event loop is alive (cron still firing)
grep 'cron: timer armed' /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | tail -1
# 4. Check compaction model auth (misconfigured = compaction fails silently)
python3 -c "
import json
c=json.load(open('$HOME/.openclaw-vesper/openclaw.json'))
model=c.get('agents',{}).get('defaults',{}).get('compaction',{}).get('model','not set')
print(f'Compaction model: {model}')
provider=model.split('/')[0] if '/' in model else model
# Check if provider has auth
import os
for d in ['.openclaw', '.openclaw-vesper']:
p=os.path.expanduser(f'~/{d}/agents/main/agent/auth-profiles.json')
if os.path.exists(p):
data=json.load(open(p))
found=[k for k in data.get('profiles',{}) if provider.replace('-cli','') in k]
print(f' {d}: auth={\"yes\" if found else \"NO - MISCONFIGURED\"} ({found})')
"
Distinguishing from #15 (Context Overflow):
timed out during compaction, affects ALL providerslaunchctl bootout gui/501/ai.openclaw.vesper && launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.vesper.plist
Prevention:
google-gemini-cli set but no gemini auth profile)timeoutSeconds from 1800 (30m) to 600 (10m) to detect hung calls fasterout lags in by >5 minutes, agent is stucktyping TTL reached + no subsequent sendMessage within 5 minutes = stuck, not slowSymptom: Messages processed but context degrades over time. Agent "forgets" recent context or gives less coherent responses. No visible errors to the user — gateway keeps running. Diagnosis:
grep -i "timed out during compaction" ~/.openclaw/logs/gateway.err.log | tail -10
grep "compaction-safeguard" ~/.openclaw/logs/gateway.err.log | tail -10
Log signatures:
[agent/embedded] using current snapshot: timed out during compaction runId=... sessionId=...[compaction-safeguard] Compaction safeguard: new content uses X% of context; dropped N older chunk(s) (M messages) to fit history budget.
Root cause: The compaction model (configured in agents.defaults.compaction.model) is timing out. For example, google-gemini-cli/gemini-2.5-flash has OAuth latency + API rate limits that cause frequent timeouts. The gateway falls back to "safeguard" mode: dropping old message chunks without generating a proper summary. This preserves recent context but loses historical continuity — the agent gradually loses older conversation history.
Distinguishing from #15 (Context Overflow):# Check current compaction model
cat ~/.openclaw/openclaw.json | python3 -c "import sys,json; c=json.load(sys.stdin); print(c.get('agents',{}).get('defaults',{}).get('compaction',{}).get('model','not set (using primary model)'))"
# Switch to Sonnet (proven reliable for compaction)
# Use gateway config.patch or edit openclaw.json directly:
# agents.defaults.compaction.model = "anthropic/claude-sonnet-4-6"
Good compaction model choices:
anthropic/claude-sonnet-4-6 — reliable, fast, good summarizationanthropic/claude-sonnet-4-20250514 — same tiergoogle-gemini-cli/gemini-2.5-flash — cheapest but prone to timeout (OAuth + rate limits)timed out during compaction in err.log periodically.Symptom: Gateway starts, runs for ~2 minutes, receives SIGTERM, LaunchAgent left unloaded. Repeats on every manual bootstrap. No config.patch visible in gateway logs (it happens too fast or logs rotate). Update to latest version does not fix it.
Diagnosis: The agent's session preserves context about a pending config change (e.g., "fix auth order array"). On each restart, the agent resumes that intent and immediately issues a config.patch on auth.* keys, triggering a restart cascade. The gateway dies, gets re-bootstrapped, and the cycle repeats because the session still has the patching intent.
# 1. Check if sessions are bloated (session context preserves patching intent)
python3 -c "
import json
data=json.load(open('$HOME/.openclaw/agents/main/sessions/sessions.json'))
bloated=[k for k,v in data.items() if v.get('contextTokens',0)>=150000]
print(f'Total: {len(data)}, Bloated: {len(bloated)}')
for k in bloated[:5]: print(f' {k}')
"
# 2. Check if plist was regenerated after update
ls -la ~/Library/LaunchAgents/ai.openclaw.gateway.plist
openclaw --version
# If plist mtime is before the update, it's stale
Distinguishing from #12 (Config Patch Restart Cascade):
# 1. Regenerate plist for current version
openclaw gateway install
# 2. Prune bloated sessions (breaks the loop by clearing patching intent)
python3 -c "
import json
path='$HOME/.openclaw/agents/main/sessions/sessions.json'
data=json.load(open(path))
healthy={k:v for k,v in data.items() if v.get('contextTokens',0)<150000}
print(f'Pruning {len(data)-len(healthy)} bloated, keeping {len(healthy)} healthy')
json.dump(healthy, open(path, 'w'), indent=2)
"
# 3. Bootstrap
launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Prevention:
openclaw gateway install after openclaw update to regenerate the plistauth.* keys without human confirmationSymptom: Both main AND vesper gateways go down at the same time. No reboot. No config changes. LaunchAgents left unloaded for both.
Diagnosis: OpenClaw auto-updated to a new version. The package was replaced hours earlier but the deferred restart fired later, sending SIGTERM to all running gateway processes within seconds of each other. The kickstart -k fix from v2026.3.12 does NOT prevent this — auto-update restarts still leave LaunchAgents unloaded.
# 1. Confirm simultaneous SIGTERM
grep 'signal SIGTERM received' ~/.openclaw/logs/gateway.log | tail -1
grep 'signal SIGTERM received' ~/.openclaw-vesper/logs/gateway.log | tail -1
# If timestamps are within ~10 seconds → auto-update
# 2. Check version changed
openclaw --version
ls -la /opt/homebrew/lib/node_modules/openclaw/package.json # mtime = update time
Distinguishing from other failures:
# 1. Prune bloated sessions for all profiles
for dir in ~/.openclaw ~/.openclaw-vesper; do
f="$dir/agents/main/sessions/sessions.json"
[ -f "$f" ] && python3 -c "
import json
path='$f'
data=json.load(open(path))
healthy={k:v for k,v in data.items() if v.get('contextTokens',0)<150000}
pruned=len(data)-len(healthy)
if pruned:
json.dump(healthy, open(path, 'w'), indent=2)
print(f'$(basename $dir): pruned {pruned}')
"
done
# 2. Regenerate plists for new version
openclaw gateway install --force
openclaw --profile vesper gateway install --force
# 3. Verify plist alignment (failure mode #14)
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.gateway.plist
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.vesper.plist
# 4. Restart both
launchctl bootout gui/501/ai.openclaw.gateway 2>/dev/null; launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
launchctl bootout gui/501/ai.openclaw.vesper 2>/dev/null; launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.vesper.plist
Prevention:
update.autoInstall in openclaw.jsonopenclaw gateway install --force for all profiles| RSS | Status | Action |
|---|---|---|
| < 500MB | Healthy | None |
| 500MB-1.5GB | Elevated | Monitor |
| 1.5GB-2.5GB | High | Schedule restart |
| > 2.5GB | Critical | Restart now |
The gateway runs on Node.js and defaults to ~4GB max old space. For long-running gateways or heavy plugin loads, increase via --max-old-space-size in the LaunchAgent plist's ProgramArguments:
<string>--max-old-space-size=16384</string>
Insert after the node binary path, before the entry JS file. Current state:
--max-old-space-size=16384 (16GB) — set to handle QMD/memory-search workloadsTo add or change, edit the plist directly and reload:
# Edit the plist
nano ~/Library/LaunchAgents/ai.openclaw.gateway.plist
# Reload
launchctl bootout gui/501/ai.openclaw.gateway && launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist
If the gateway OOMs before hitting the RSS thresholds above, this is likely the fix.
| Profile | Config | State | Port |
|---|---|---|---|
| main | ~/.openclaw/openclaw.json | ~/.openclaw/ | 18789 |
| vesper | ~/.openclaw-vesper/openclaw.json | ~/.openclaw-vesper/ | 18999 |
Plugin auto-discovery paths (scanned on startup, no config entry needed):
~/.openclaw/extensions/<plugin-id>/ — per-profile custom pluginsplugins.load.paths — explicitly loaded extensions$OPENCLAW_GATEWAY_TOKEN (env var in shell rc — ~/.zshrc/~/.bashrc). Takes precedence over gateway.remote.token in config. Profile-agnostic — all profiles must use the same token when this env var is set.$OPENCLAW_GATEWAY_PASSWORD (env var in shell rc — ~/.zshrc/~/.bashrc)$OPENCLAW_GATEWAY_TOKEN > gateway.remote.token (config)gateway.controlUi.dangerouslyDisableDeviceAuth: true — only bypasses Control UI, not CLI/TUITop-level openclaw.json auth.profiles declares profile type/mode only — no tokens.
Actual tokens live in per-agent auth profile files:
# Main profile
~/.openclaw/agents/main/agent/auth-profiles.json
~/.openclaw/agents/codex/agent/auth-profiles.json
# Vesper profile
~/.openclaw-vesper/agents/main/agent/auth-profiles.json
~/.openclaw-vesper/agents/codex/agent/auth-profiles.json
Each has profiles.<provider>:default with access/refresh/expires for OAuth, or token for API keys.
The expires field is epoch milliseconds — compare to Date.now() or time.time()*1000 to check expiry.
Fresh Anthropic setup tokens: ~/clawd/inbox/2026-03-03-anthropic-setup-tokens
openclaw doctor --fix removes token fields from top-level auth.profiles in openclaw.json (schema change). This does NOT affect per-agent auth profiles — those still use token as the field name. If doctor runs and removes tokens from the top-level config, the gateway still works because it reads from per-agent files at runtime.
Symptom: OAuth token refresh failed for openai-codex or refresh_token_reused — the access token expired and the refresh token is single-use/already consumed.
Diagnosis: Check expires field in auth-profiles.json — if epoch ms is in the past, access token is expired. If refresh also fails, full re-auth needed.
Fix: Interactive re-auth: openclaw configure (add --profile vesper if vesper profile).
Symptom: No API key found for provider "<provider>" with auth store path shown.
Diagnosis: The model fallback chain references a provider that was never set up in auth-profiles.json.
Fix: Either configure the provider (openclaw agents add <provider>) or remove it from the fallback chain in openclaw.json → agents.defaults.model.fallbacks.
Check memory search status as part of triage when the agent isn't responding correctly:
openclaw --profile vesper memory status
Key config: agents.defaults.memorySearch.enabled in openclaw.json — if false, the memory_search/memory_get tools won't register even if listed in tools.alsoAllow.
Enabling requires a gateway restart (hot-reload picks up the config but tool registration needs restart).
qmd runs on Node.js (#!/usr/bin/env node), NOT Bun. The sqlite-vec extension loads fine under Node's better-sqlite3. Previous reports of "sqlite-vec/Bun" issues are a red herring for OpenClaw users.
If qmd embed hangs or fails:
# 1. Check Homebrew SQLite is installed
brew list sqlite
# 2. Rebuild better-sqlite3 if needed
npm rebuild better-sqlite3 --build-from-source
# Note: npm v11 warns about --build-from-source but the flag still works (cosmetic warning)
# 3. Check embedding status
qmd status # Shows pending embedding count
# 4. Force re-embedding of all content
qmd embed -f
# 5. Update collections that may have new files
qmd update <collection> # e.g., tool-heuristics collections after adding files
Key commands for memory search triage:
qmd status — shows collections, document counts, pending embedsqmd embed — process pending embeddings (runs incrementally)qmd embed -f — force re-embed everything (nuclear option)qmd update <collection> — re-scan collection source for new/changed files| Profile | LaunchAgent plist | Stop + Start |
|---|---|---|
| main | ~/Library/LaunchAgents/ai.openclaw.gateway.plist | openclaw gateway stop && launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist |
| vesper | ~/Library/LaunchAgents/ai.openclaw.vesper.plist | openclaw --profile vesper gateway stop && launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.vesper.plist |
If gateway start says "Gateway service not loaded", use launchctl bootstrap directly.
After fixing any issue:
openclaw channels status — all channels should show "running"ps -o pid,rss,pcpu,etime -p $(lsof -i :18789 -t | head -1)~/clawd/inbox/YYYY-MM-DD-<description>.md# Incident: <Title> — YYYY-MM-DD
## Summary
<1-2 sentences>
## Symptoms
- <what the user saw>
## Root Cause
<what went wrong and why>
## Fix
<what was done>
## Config Changes
| File | Change |
|------|--------|
## Prevention
<how to avoid next time>
Run after any openclaw version bump:
openclaw --version
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-upgrade
openclaw doctor --fix
# CRITICAL: Regenerate plist for new version (update does NOT do this automatically)
openclaw gateway install
# For vesper too:
openclaw --profile vesper gateway install
# Verify plists weren't cross-contaminated (failure mode #14)
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.gateway.plist
grep OPENCLAW_STATE_DIR ~/Library/LaunchAgents/ai.openclaw.vesper.plist
openclaw devices list --json | jq '.pending'
# Approve any pending pairings
openclaw channels status
Prefix all commands with --profile vesper:
openclaw --profile vesper channels status
openclaw --profile vesper gateway start
openclaw --profile vesper doctor --fix