Agent Causal

MCP Tools

Agent Causal Decision Tool helps you and your AI agents answer one question from experiment data: should we ship this change, keep running the test, or roll it back? Returns structured JSON decisions, key statistics, and audit trails from A/B tests (frequentist + Bayesian), DiD, cohort/segment analysis, and sequential early stopping.

Install

openclaw skills install agent-causal

Agent Causal Decision Tool

A causal decision and audit tool for AI agents. Evaluate product changes using A/B testing, Difference-in-Differences, and sequential early stopping.

Source: https://github.com/ZhuMorris/agent-causal-decision-tool

What is this?

Agent Causal Decision Tool helps you and your AI agents answer one question from experiment data: "should we ship this change, keep running the test, or roll it back?" It takes in simple A/B or rollout summaries and returns a structured JSON decision, key statistics, and an audit record you can store or review later.

Rather than being a full experimentation platform, it is a decision engine. You bring the data (from your logs, BI tool, or CSV); it handles the stats, decision logic, and audit trail.

Why it exists

In many teams, experiment decisions happen in ad hoc spreadsheets or dashboards. People glance at lift, argue about whether the sample size is enough, and sometimes ship features based on noisy or biased results. Agents make this worse if they are wired to react to any small uplift they see.

This tool wraps a few standard methods into one consistent, agent‑friendly interface:

Easy-mode dispatcher (decide) — no need to know which statistical method to use. Paste your numbers and it auto-selects A/B, Bayesian, DiD, or planning from your input fields.
Frequentist A/B testing for classic "control vs variant" questions.
Bayesian A/B testing when you want answers like "there is a 93% chance B is better than A" instead of only p‑values.
Difference‑in‑differences (DiD) for quasi‑experiments like staged rollouts or region‑based launches where you cannot randomize perfectly.
Cohort / segment breakdown when an aggregate result is inconclusive — you can slice by user segment to find hidden signals, with Benjamini-Hochberg correction for 4+ segments.
Planning and power checks so you can see if a test is realistic before you start it.
Decision audit so humans can see what the agent did, why it did it, and how strong the evidence really was.
External connectors — pull experiment data directly from PostHog, normalize it, and run a decision in one step. No manual export needed.

The goal is not to replace your analytics stack, but to give agents a small, reliable decision block they can call inside workflows.

When to use it

Use this tool whenever you or your agents have experiment or rollout results and need a decision you can defend:

You ran an A/B test and want to know whether to ship, keep running, or reject the variant.
You're not sure which method to use — let decide auto-detect from your numbers.
You ran an A/B test and it was inconclusive — you want to know if a specific user segment is driving (or diluting) the effect.
You rolled out a feature to one region or cohort first and want a DiD estimate of impact compared to a similar control group.
You prefer a Bayesian summary ("95% chance B is better; expected lift 3–5%") to drive thresholds in automated workflows.
You need an audit trail with experiment period, traffic size, assumptions, thresholds, and warnings so product, data, or risk teams can review agent decisions later.
You want to plan an experiment (sample size, minimum detectable effect, expected duration) or compare current results to previous experiments to see which wins are robust.
Your experiment data lives in PostHog — you want to fetch, normalize, and decide without any manual CSV export.

Security Model

This skill runs as a local CLI tool only. After one-time setup, it requires no network access.

Setup (one-time, before first use):

# Download the release tarball — no git clone needed
curl -sL https://github.com/ZhuMorris/agent-causal-decision-tool/archive/refs/tags/v0.10.2.tar.gz -o agent-causal.tar.gz
tar -xzf agent-causal.tar.gz
pip install agent-causal-decision-tool-0.10.2/ -q

After installation, the agent-causal command is available locally. The tool performs only local statistical calculations.

When network access occurs: The PostHog connector (agent-causal connect posthog) makes outbound HTTPS requests to your PostHog instance only when explicitly invoked — never automatically. If you do not use the connector, no outbound network access is needed at any point.

No runtime network access during analysis. The decision engine, audit, and cohort analysis do not make outbound requests. Tools used: exec (for running the agent-causal CLI commands you specify). Commands are fully hardcoded with no user-supplied strings interpolated into shell execution. PostHog token scope: Use a read-only API token with minimal scopes. Do not use tokens with write or admin permissions. Credential handling: PostHog API credentials are read from env vars (POSTHOG_API_KEY/POSTHOG_PROJECT_ID) or a local ~/.posthogrc file — never hardcoded or logged.

Setup

Before using this skill, install the tool (one-time):

git clone https://github.com/ZhuMorris/agent-causal-decision-tool.git ~/clawd/agent-causal-decision-tool
pip install ~/clawd/agent-causal-decision-tool -q

After this, agent-causal is available as a local command. No further network access is required.

Agent-Native Actions (JSON-RPC 2.0)

For AI agent integrations, Agent Causal exposes a JSON-RPC 2.0 API over both stdio and HTTP.

Stdio mode (for OpenClaw, Codex, Claude Code)

python -m src.api stdio

HTTP mode

python -m src.api http --port 8000

Actions

Action	Description
`decide`	Easy-mode dispatcher — auto-selects A/B, Bayesian, DiD, or planning from your input fields
`decide_ab`	Frequentist A/B test (`mode: frequentist`) or Bayesian (`mode: bayesian`)
`decide_rollout`	DiD for staged rollouts / quasi-experiments
`plan_test`	Experiment planning (sample size, MDE, feasibility)
`audit_result`	Full audit of a stored result by ID
`save_result`	Persist a decision result to SQLite history
`get_result`	Retrieve a stored result by ID
`compare_results`	Compare multiple stored experiments
`connect`	Fetch experiment data from an external connector (e.g. PostHog)

Request format

{
  "jsonrpc": "2.0",
  "method": "decide_ab",
  "params": {
    "mode": "frequentist",
    "input": {
      "control_conversions": 100,
      "control_total": 5000,
      "variant_conversions": 130,
      "variant_total": 5000
    }
  },
  "id": 1
}

Unified response schema

{
  "decision": "ship|keep_running|reject|escalate",
  "recommended_next_action": "Deploy variant — statistical significance achieved with positive lift.",
  "selected_method": "ab_test|bayesian_ab|did|planning",
  "selection_reason": "Why this method was chosen",
  "confidence": "high|medium|low",
  "effect_summary": "Estimated lift: +30.00% (positive)",
  "warnings": [{"code": "LOW_TRAFFIC", "message": "...", "severity": "info"}],
  "limitations": ["No multiple testing correction applied"],
  "audit_summary": "ab_test: Decision",
  "source_metadata": {"connector": "langsmith", "dataset_id": "ds-001"},
  "internal_result": { ... }
}

Error format

{
  "code": "VALIDATION_ERROR",
  "message": "Invalid A/B test inputs",
  "data": {
    "details": [{"field": "control_total", "issue": "must be >= 1"}],
    "request_id": null
  }
}

Commands

Easy-mode Dispatcher (`decide`)

Don't know which method you need? decide auto-detects from your input fields — no need to pick the right command:

# A/B test (auto-detected)
PYTHONPATH=. python3 -m src.cli decide --control 100/5000 --variant 130/5000
PYTHONPATH=. python3 -m src.cli decide --control 100/5000 --variant 130/5000 --format text

# Bayesian A/B (--bayesian flag)
PYTHONPATH=. python3 -m src.cli decide --control 100/5000 --variant 130/5000 --bayesian

# DiD / staged rollout (auto-detected from pre/post treated fields)
PYTHONPATH=. python3 -m src.cli decide --pre-control 1000 --post-control 1200 --pre-treated 200 --post-treated 280

# Experiment planning (auto-detected from --baseline + --mde)
PYTHONPATH=. python3 -m src.cli decide --baseline 0.05 --mde 10 --traffic 10000

Auto-detection matrix:

You provide...	It runs...
`--control` + `--variant`	Frequentist A/B
`--control` + `--variant` + `--bayesian`	Bayesian A/B
`--pre-control` + `--post-control` + `--pre-treated` + `--post-treated`	DiD (Difference-in-Differences)
`--baseline` + `--mde`	Experiment planning

JSON-RPC: {"jsonrpc":"2.0","method":"decide","params":{...fields...},"id":"1"} — same auto-detection over stdio or HTTP.

Experiment Planning (ab_plan)

Estimate required sample size, duration, and feasibility before running an experiment:

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli plan --baseline 0.02 --mde 5 --traffic 5000

Parameters:

--baseline (required): Baseline conversion rate (e.g., 0.02 for 2%)
--mde (required): Minimum detectable effect as % lift (e.g., 5 for 5% lift)
--traffic (required): Daily traffic per arm
--confidence (default 0.95): Confidence level
--power (default 0.8): Statistical power
--allocation: equal (default) or custom
--allocation-ratio: Custom ratio when allocation=custom (e.g., 0.3/0.7)
--format: json (default) or text

Planning output:

{
  "mode": "planning",
  "recommendation": {
    "decision": "feasible|slow|not_recommended",
    "confidence": "high|medium|low",
    "summary": "..."
  },
  "planning": {
    "required_sample_per_arm": 182934,
    "total_required": 365868,
    "estimated_days": 36.6,
    "feasibility": "slow",
    "allocation_used": {"control": 0.5, "variant": 0.5}
  },
  "warnings": [...]
}

Feasibility thresholds:

feasible: ≤14 days
slow: 15–60 days
not_recommended: >60 days

A/B Test Analysis (Frequentist)

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000

Parameters:

--control: Control group conversions/total (e.g., 100/5000)
--variant: Variant group conversions/total (e.g., 130/5000)
--name: Variant name (optional, default: variant_1)
--format: Output format json (default) or text

Sequential / Early Stopping (optional):

--sequential/--no-sequential: Enable sequential early stopping evaluation
--experiment-start, --experiment-end: ISO 8601 timestamps for runtime calculation
--min-runtime-days (default 7): Minimum days before early stop is considered
--min-sample-per-arm (default 2000): Minimum sample per arm before early stop
--early-stop-p (default 0.01): p-value threshold for early stop
--max-runtime-days: Hard cap; escalates if exceeded without strong result

Trigger logic: Both min-runtime AND min-sample-per-arm must be met, AND p-value below --early-stop-p. Max runtime exceeded always escalates.

PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000

Example Output:

{
  "schema_version": "0.8.0",
  "mode": "ab_test",
  "recommendation": {
    "decision": "ship",
    "confidence": "medium",
    "summary": "Variant performs 30.00% better (p=0.0454). Ship it.",
    "primary_metricLift": 30.0,
    "p_value": 0.045361
  },
  "statistics": {
    "control_rate": 0.02,
    "variant_rate": 0.026,
    "relative_lift_pct": 30.0,
    "z_score": 2.0013,
    "p_value": 0.045361,
    "lift_ci_95": [0.000124, 0.011876],
    "relative_lift_ci_95": [0.619, 59.381]
  },
  "traffic_stats": {
    "control_size": 5000,
    "variant_size": 5000,
    "total_size": 10000
  },
  "warnings": [],
  "next_steps": ["Deploy variant", "Monitor over time for regression"],
  "audit": {
    "decision_path": [
      {"step": "Input validation", "passed": true},
      {"step": "Traffic check", "passed": true},
      {"step": "Conversion rate calculation", "passed": true},
      {"step": "Statistical significance test", "passed": true},
      {"step": "Effect size check", "passed": true},
      {"step": "Decision", "passed": true}
    ]
  }
}

Bayesian A/B Test

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli bayes --control 100/5000 --variant 130/5000

Uses Beta-Binomial conjugate model with Jeffreys prior.

Prior: Beta(0.5, 0.5) — uninformative
Posterior: Beta(α + successes, β + failures)
Decision via Monte Carlo simulation (20k samples)
Thresholds: P(variant wins) ≥ 0.95 → ship, ≤ 0.05 → reject

Parameters:

--control, --variant: Conversions/total (same as ab)
--name: Variant name
--format: json (default) or text
--samples: Monte Carlo samples (default: 20000)

Example output:

{
  "schema_version": "0.8.0",
  "timestamp": "2026-05-06T13:00:00.000Z",
  "mode": "bayesian_ab",
  "recommendation": {
    "decision": "ship",
    "confidence": "medium",
    "summary": "Variant wins with P(better)=0.976. Median lift=30.10%. Ship.",
    "primary_metricLift": 30.10,
    "p_value": 0.9758
  },
  "statistics": {
    "control_rate_observed": 0.0200,
    "variant_rate_observed": 0.0260,
    "relative_lift_pct": 30.00,
    "posterior_control": {"alpha": 100.5, "beta": 4900.5, "mean": 0.0201},
    "posterior_variant": {"alpha": 130.5, "beta": 4870.5, "mean": 0.0261},
    "p_variant_wins": 0.9758,
    "p_control_wins": 0.0239,
    "p_tie": 0.0003,
    "lift_median_pct": 30.10,
    "lift_95ci_pct": [0.20, 69.15],
    "expected_lift_hdi_95": [0.0004, 0.0014],
    "relative_lift_hdi_95": [2.00, 70.00],
    "monte_carlo_samples": 20000,
    "prior_used": {"alpha": 0.5, "beta": 0.5, "type": "Jeffreys"}
  },
  "traffic_stats": {
    "control_size": 5000,
    "variant_size": 5000,
    "total_size": 10000
  },
  "warnings": [],
  "next_steps": ["Deploy variant", "Monitor for regression"],
  "audit": {
    "experiment_type": "bayesian_ab",
    "thresholds_applied": {"ship": 0.95, "reject": 0.05},
    "assumptions": ["Independent observations between groups", "No selection bias in group assignment", "Jeffrey's prior is appropriate for conversion rates"],
    "limitations": ["Monte Carlo simulation has finite sampling error", "No multiple testing correction applied"]
  },
  "inputs": {
    "control_conversions": 100,
    "control_total": 5000,
    "variant_conversions": 130,
    "variant_total": 5000
  }
}

Access fields via . attribute (e.g. result.recommendation.decision) or .model_dump() for dict. Serialize with .model_dump_json().

When to use Bayesian vs Frequentist:

DiD Analysis

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli did --pre-control 1000 --post-control 1100 --pre-treated 900 --post-treated 1150

Parameters:

--pre-control: Control group metric before treatment
--post-control: Control group metric after treatment
--pre-treated: Treated group metric before treatment
--post-treated: Treated group metric after treatment
--n-bootstrap (default 2000, range 500–10000): Number of bootstrap resamples for DiD CI

Cohort / Segment Breakdown

When an aggregate A/B or DiD result is inconclusive, break down results by user segment to find hidden signals:

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli cohort-breakdown --file segments.json

Input format (JSON):

{
  "experiment_id": "checkout-v3",
  "metric": "conversion_rate",
  "prior_result_id": "dec_20260501_001",
  "prior_decision": "wait",
  "segments": [
    {
      "segment_name": "new_users",
      "segment_definition_note": "Users registered within last 30 days",
      "control_conversions": 21,
      "control_total": 1000,
      "variant_conversions": 67,
      "variant_total": 1000
    },
    {
      "segment_name": "returning_users",
      "segment_definition_note": "Users registered more than 30 days ago",
      "control_conversions": 220,
      "control_total": 4000,
      "variant_conversions": 228,
      "variant_total": 4000
    }
  ]
}

Input format (CSV alternative):

segment_name,segment_definition_note,arm,conversions,total
new_users,Users registered within last 30 days,control,21,1000
new_users,Users registered within last 30 days,variant,67,1000
returning_users,Users registered more than 30 days ago,control,220,4000
returning_users,Users registered more than 30 days ago,variant,228,4000

Parameters:

--file: Path to JSON or CSV segment file
--json: JSON input string (alternative to --file)
--format: Output format json (default) or text
--save: Save result to experiment history

Multiple comparison correction:

4+ segments: Benjamini-Hochberg FDR correction applied automatically
5+ segments: Also offers Bonferroni as alternative via --method bonferroni

Example output:

{
  "method": "experiment_cohort_breakdown",
  "cohort_decision_override": true,
  "cohort_override_reason": "Strong positive signal in 'new_users' (lift=219.0%, adj-p=0.0000) contradicts aggregate decision 'wait'",
  "interaction_flag": false,
  "segments": [
    {
      "segment_name": "new_users",
      "control_rate": 0.021,
      "variant_rate": 0.067,
      "relative_lift_pct": 219.05,
      "p_value_raw": 0.0000,
      "p_value_adjusted": 0.0000,
      "decision": "strongly_positive",
      "priority_rank": 1
    }
  ],
  "priority_ranking": [
    {"rank": 1, "segment": "new_users", "rationale": "Strong positive effect (lift=219.1%, adj-p=0.0000). Highest priority."}
  ],
  "summary": "new_users drives the effect. 1 segment(s) positive.",
  "recommended_next_action": "targeted_rollout"
}

When to use cohort breakdown:

Aggregate A/B result is keep_running or escalate — segment analysis may reveal a hidden signal
One segment is strongly positive while another is strongly negative (interaction flag)
You want to ship only to specific segments rather than all users

Key features:

Per-segment two-proportion z-test with 95% confidence
Benjamini-Hochberg FDR correction for 4+ segments (controls false-discovery rate)
Priority ranking by absolute lift magnitude
Cohort decision override: fires when a segment contradicts the aggregate decision
Interaction flag: triggered when segments show opposing strongly-significant directions

Decision Audit

Reconstruct and explain a previous decision:

# Save result to file
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000 > /tmp/result.json

# Audit it (human-readable)
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --format text

# Audit with experiment maturity assessment
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --maturity

Maturity assessment (with --maturity flag):

Scores experiments 0–100 across 8 checks
Labels: mature (≥90), adequate (≥70), immature (≥50), inadequate (<50)
Checks: decision path completeness, critical warnings, limitations documented, traffic sufficiency, confidence level, step documentation

Example audit output:

-- DECISION PATH --
1. Input validation [✓]
   control_total: 5000, variant_total: 5000
2. Traffic check [✓]
   control_size: 5000, min_required: 1000
3. Conversion rate calculation [✓]
   control_rate: 0.02, variant_rate: 0.026
4. Statistical significance test [✓]
   p_value: 0.045361, alpha: 0.05
5. Effect size check [✓]
   lift_pct: 30.0, threshold: 1
6. Decision [✓]
   decision: ship, confidence: medium

-- FINAL DECISION --
  Decision: SHIP

Experiment History & Persistence

All commands support --save to persist results to local SQLite history:

# Run and save in one step
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000 --save
PYTHONPATH=. python3 -m src.cli did --pre-control 1000 --post-control 1100 --pre-treated 900 --post-treated 1150 --save
PYTHONPATH=. python3 -m src.cli plan --baseline 0.02 --mde 5 --traffic 5000 --save

History commands:

# List recent experiments
PYTHONPATH=. python3 -m src.cli history
PYTHONPATH=. python3 -m src.cli history --mode ab_test --limit 10

# Compare multiple experiments by ID
PYTHONPATH=. python3 -m src.cli compare 1 2 3

# Save a prior JSON result file to history
PYTHONPATH=. python3 -m src.cli save /tmp/result.json --name "checkout-v3-test"

History output example:

ID    Date       Mode       Decision   Lift     P-value  Summary
--------------------------------------------------------------------
3     2026-04-30 did        ship       16.67    -        Treatment effect is 150.00...
2     2026-04-30 ab_test    escalate   6.25     0.6947   Results not conclusive...
1     2026-04-30 ab_test    ship       30.00    0.0454   Variant performs 30.00%...

Compare output example:

EXPERIMENT COMPARISON
==================================================
Experiments compared: 3

Summary by decision:
  SHIP: 2 experiment(s)
  ESCALATE: 1 experiment(s)

Summary by mode:
  ab_test: 2 experiment(s)
  did: 1 experiment(s)

Lift summary: max=30.00%, min=6.25%, avg=17.64% (3 experiments)

Attention needed: 2 experiments recommend ship. Review if they test the same metric.

Persistence:

SQLite DB stored at ~/.agent-causal/history.db
All experiment modes supported: ab_test, did, planning
Full raw JSON preserved for audit reconstruction
Filter by mode, limit results, name experiments for later reference

Decision Reference

Decision	Meaning	When
`ship`	Deploy variant	p < 0.05 AND positive lift
`keep_running`	Continue experiment	p < 0.3, trending positive
`reject`	Do not deploy	p < 0.05 AND negative lift
`escalate`	Needs human review	Not conclusive or critical warnings
`targeted_rollout`	Ship to specific segment only	Strong signal in one segment, aggregate inconclusive
`full_rollout`	Ship to all users	All segments positive
`abandon_segment`	Do not ship to specific segment	Strong negative in one segment despite aggregate ship
`confirm_rejection`	Confirm abandonment	All segments negative

Python API

import sys
sys.path.insert(0, '~/clawd/agent-causal-decision-tool')

from src.ab_test import calculate_ab

result = calculate_ab({
    "control_conversions": 100,
    "control_total": 5000,
    "variant_conversions": 130,
    "variant_total": 5000
})

if result.recommendation.decision == "ship":
    # Deploy variant
    pass

Warnings & Limitations

LOW_TRAFFIC: Sample size below 1000 per group
SMALL_EFFECT: Lift < 1%, may not be practically significant
INCONCLUSIVE: Result not statistically significant or strong enough to act on
NOT_SIGNIFICANT: Far from significant; consider stopping the experiment
BORDERLINE_P_VALUE: p-value between 0.05 and 0.10 — weak evidence, not conclusive
CORRECTION_CONSERVATIVE: Multiple comparison correction applied; may increase false negatives
SEQUENTIAL_EARLY_STOP: Experiment stopped early via sequential testing; interpret with caution
SEQUENTIAL_CONDITIONS_NOT_MET: Early stopping conditions not met; normal decision applied
MAX_RUNTIME_EXCEEDED: Hard runtime cap exceeded without strong result; escalating
ZERO_BASELINE: Pre-period values cannot be zero for reliable DiD
PARALLEL_TRENDS_VIOLATED: Control and treated groups show very different pre-to-post ratios (critical)
PARALLEL_TRENDS_WEAK: Ratios diverge somewhat; monitor closely
BOTH_GROUPS_GREW: Both groups grew; cannot separate treatment effect from time trend
AGGREGATE_DATA: Analysis on aggregated data; use individual-level data for robust inference
AGGREGATE_DATA_DID: DiD result with high caution; not equivalent to a randomized experiment
SINGLE_PRE_PERIOD: Only one pre-period observation; parallel trends cannot be assessed
SMALL_SAMPLE: Sample size small; estimates unreliable
IMBALANCED_GROUPS: Treatment/control group sizes very different; may bias DiD estimate
LARGE_EFFECT_SMALL_SAMPLE: Large effect estimate from small sample; prioritize replication
PARALLEL_TRENDS_NO_DATA: No pre-period count provided; parallel trends cannot be assessed
BOOTSTRAP_CI_UNRELIABLE: Bootstrap CI not computed — count < 100 or zero baseline
BOOTSTRAP_CI_WIDE: Bootstrap CI range > 2×|DiD estimate|; point estimate uncertain
DID_CI_CROSSES_ZERO: Bootstrap CI crosses zero; effect direction uncertain
SLOW_EXPERIMENT: Estimated duration > 30 days; seasonal effects may confound results
INFEASIBLE_EXPERIMENT: Duration too long; consider DiD instead
SMALL_MDE: MDE very small; may require impossibly large sample
BASELINE_VERY_LOW: Baseline rate < 0.5%; estimations may be unreliable
BASELINE_NEAR_ZERO: Baseline rate < 0.1%; do not run experiment without careful review
PRIOR_DOMINATES: Very low total traffic; Jeffreys prior dominates posterior; interpret with caution
CREDIBLE_INTERVAL_WIDE: Bayesian credible interval is very wide; estimate uncertain

Schema Contract

The tool exposes a versioned schema contract for agent consumption:

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli schema

This prints schema.json — a wrapper containing schema_version, schema_coverage (ab, did, plan, bayes), schema_coverage_pending (cohort), severity_contract, and definitions (JSON Schema from Pydantic models).

All output models include schema_version field injected from package metadata — never hardcoded.

Location

GitHub: https://github.com/ZhuMorris/agent-causal-decision-tool
Local: ~/clawd/agent-causal-decision-tool/

Dependencies

Python 3.9+
click >= 8.1.0
scipy >= 1.11.0
numpy >= 1.26.0
pydantic >= 2.10.0

External Connectors

Fetch experiment data directly from external sources. The connect action normalizes external data into the internal experiment schema before running a decision.

PostHog

# Health check (validates credentials, no data fetched)
PYTHONPATH=. python3 -m src.cli connect posthog --dry-run

# Fetch experiment and print normalized data
PYTHONPATH=. python3 -m src.cli connect posthog --experiment-id <id>

# Fetch and run through decision workflow automatically
PYTHONPATH=. python3 -m src.cli connect posthog --experiment-id <id> --decide

# JSON-RPC call
{"jsonrpc":"2.0","method":"connect","params":{"source":"posthog","experiment_id":"<id>"},"id":"1"}

Environment / config:

POSTHOG_API_KEY + POSTHOG_PROJECT_ID env vars, OR
~/.posthogrc with api_key, project_id, instance_url fields

Connector result schema:

{
  "data": { "control_conversions": 120, "control_total": 5000, "variant_conversions": 145, "variant_total": 5000 },
  "source_metadata": { "connector": "posthog", "experiment_id": "...", "fetch_timestamp": "..." },
  "warnings": []
}

Errors:

INSUFFICIENT_DATA — experiment found but missing required fields
ConnectorAuthError — invalid/missing API key
ConnectorNotFoundError — experiment not found

Agent Causal

Install

Agent Causal Decision Tool

What is this?

Why it exists

When to use it

Security Model

Setup

Agent-Native Actions (JSON-RPC 2.0)

Stdio mode (for OpenClaw, Codex, Claude Code)

HTTP mode

Actions

Request format

Unified response schema

Error format

Commands

Easy-mode Dispatcher (decide)

Experiment Planning (ab_plan)

A/B Test Analysis (Frequentist)

Bayesian A/B Test

DiD Analysis

Cohort / Segment Breakdown

Decision Audit

Experiment History & Persistence

Decision Reference

Python API

Warnings & Limitations

Schema Contract

Location

Dependencies

External Connectors

PostHog

Related skills

Easy-mode Dispatcher (`decide`)