Agent Observability - Open the Black Box

Provides full observability for OpenClaw agents via dashboards, decision logs, failure traces, and weekly compliance scoring for effective monitoring and deb...

Audits

Pending

Install

openclaw skills install agent-observability

Agent Observability

What Gets Installed

FilePurposeLocation
throughput-dashboard.jsWeekly productivity metricsscripts/
decision-audit.jsAppend-only decision log with reasoninglib/
failure-tracer.jsCaptures traces when quality score < 7lib/
drift-guard-auto.jsWeekly INTENT.md compliance scoringscripts/

Installation

Step 1 — Copy files

WORKSPACE="${OPENCLAW_WORKSPACE:-$(pwd)}"

cp references/throughput-dashboard.js  "$WORKSPACE/scripts/"
cp references/decision-audit.js        "$WORKSPACE/lib/"
cp references/failure-tracer.js        "$WORKSPACE/lib/"
cp references/drift-guard-auto.js      "$WORKSPACE/scripts/"

Or manually copy each file from the references/ directory in this skill.

Step 2 — Add to heartbeat/cron (weekly)

In your heartbeat or weekly cron script:

node "$WORKSPACE/scripts/throughput-dashboard.js" "$WORKSPACE"
node "$WORKSPACE/scripts/drift-guard-auto.js" "$WORKSPACE"

Step 3 — Wire decision-audit into high-stakes decisions

const { logDecision } = require('./lib/decision-audit');

logDecision({
  task_type: 'code_generation',
  decision: 'spawn CoderAgent',
  reasoning_summary: 'Multi-file edit blocks chat >5s',
  session_channel: 'discord'  // optional
}, workspaceRoot);

Step 4 — Wire failure-tracer into quality validation (optional)

The failure-tracer fires automatically when you call it after scoring subagent output:

const { captureFailureTrace } = require('./lib/failure-tracer');

// Call after scoring any subagent output
if (qualityScore < 7) {
  captureFailureTrace('AgentLabel-task', qualityScore, agentOutput, workspaceRoot);
}

Reading the Data

PathContents
memory/dashboards/YYYY-MM-DD.mdWeekly throughput snapshot
memory/drift-reports/YYYY-MM-DD.mdDrift compliance report
memory/decisions-audit.jsonlFull decision log (JSONL)
memory/traces/[label]-[timestamp].jsonFailure traces

Query examples

# Recent decisions
tail -20 memory/decisions-audit.jsonl | jq .

# All failure traces
ls memory/traces/

# Latest drift report
cat memory/drift-reports/$(ls memory/drift-reports/ | tail -1)

Tool Descriptions

throughput-dashboard.js

Aggregates weekly metrics: tasks routed, subagents spawned, estimated cost, quality ratio, routing distribution. Reads from session-metrics.js (if installed) and drift-guard-auto.js. Degrades gracefully if data sources are missing — every section is independent.

decision-audit.js

Append-only JSONL log at memory/decisions-audit.jsonl. Each entry: { id, ts, task_type, decision, reasoning_summary, outcome, session_channel }. Use updateOutcome(id, 'success', workspaceRoot) to close the loop after a decision resolves.

failure-tracer.js

Fires when quality score < 7. Writes structured JSON to memory/traces/. Each trace includes: tool call sequence hints, output snippet, inferred failure reason. Use to post-mortem why a subagent underperformed.

drift-guard-auto.js

Scores recent agent outputs against behavioral rules (sycophancy, social cushioning, unprompted explanations, hallucination hedges). Reads INTENT.md for custom criteria if installed. Writes weekly report to memory/drift-reports/.

References

  • references/throughput-dashboard.js — Full script implementation
  • references/decision-audit.js — Full lib implementation
  • references/failure-tracer.js — Full lib implementation
  • references/drift-guard-auto.js — Full script implementation