Agent Causal Decision Tool

v0.2.0

Causal decision and audit tool for AI agents. Run A/B tests and Difference-in-Differences analysis with structured JSON output, decision paths, and audit tra...

0· 40· 2 versions· 0 current· 0 all-time· Updated 3h ago· MIT-0
byMorris Zhu@zhumorris

Agent Causal Decision Tool

A causal decision and audit tool for AI agents. Evaluate product changes using A/B testing and Difference-in-Differences methods.

Source: https://github.com/ZhuMorris/agent-causal-decision-tool

Setup

Before using this skill, install the tool:

# Clone the repository (if not already present)
git clone https://github.com/ZhuMorris/agent-causal-decision-tool.git ~/clawd/agent-causal-decision-tool 2>/dev/null || true

# Install dependencies
pip install click scipy numpy pydantic -q

# Navigate to the tool directory
cd ~/clawd/agent-causal-decision-tool

Alternatively, install as a Python package:

pip install git+https://github.com/ZhuMorris/agent-causal-decision-tool.git -q

Commands

A/B Test Analysis

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000

Parameters:

  • --control: Control group conversions/total (e.g., 100/5000)
  • --variant: Variant group conversions/total (e.g., 130/5000)
  • --name: Variant name (optional, default: variant_1)
  • --format: Output format json (default) or text

Example:

PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000

Example Output:

{
  "version": "1.0",
  "mode": "ab_test",
  "recommendation": {
    "decision": "ship",
    "confidence": "medium",
    "summary": "Variant performs 30.00% better (p=0.0454). Ship it.",
    "primary_metricLift": 30.0,
    "p_value": 0.045361
  },
  "statistics": {
    "control_rate": 0.02,
    "variant_rate": 0.026,
    "relative_lift_pct": 30.0,
    "z_score": 2.0013,
    "p_value": 0.045361,
    "confidence_interval_95": [0.000124, 0.011876]
  },
  "traffic_stats": {
    "control_size": 5000,
    "variant_size": 5000,
    "total_size": 10000
  },
  "warnings": [],
  "next_steps": ["Deploy variant", "Monitor over time for regression"],
  "audit": {
    "decision_path": [
      {"step": "Input validation", "passed": true},
      {"step": "Traffic check", "passed": true},
      {"step": "Conversion rate calculation", "passed": true},
      {"step": "Statistical significance test", "passed": true},
      {"step": "Effect size check", "passed": true},
      {"step": "Decision", "passed": true}
    ]
  }
}

DiD Analysis

cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli did --pre-control 1000 --post-control 1100 --pre-treated 900 --post-treated 1150

Parameters:

  • --pre-control: Control group metric before treatment
  • --post-control: Control group metric after treatment
  • --pre-treated: Treated group metric before treatment
  • --post-treated: Treated group metric after treatment

Decision Audit

Reconstruct and explain a previous decision:

# Save result to file
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000 > /tmp/result.json

# Audit it (human-readable)
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --format text

# Audit it (JSON)
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --format json

Example audit output:

-- DECISION PATH --
1. Input validation [✓]
   control_total: 5000, variant_total: 5000
2. Traffic check [✓]
   control_size: 5000, min_required: 1000
3. Conversion rate calculation [✓]
   control_rate: 0.02, variant_rate: 0.026
4. Statistical significance test [✓]
   p_value: 0.045361, alpha: 0.05
5. Effect size check [✓]
   lift_pct: 30.0, threshold: 1
6. Decision [✓]
   decision: ship, confidence: medium

-- FINAL DECISION --
  Decision: SHIP

Decision Reference

DecisionMeaningWhen
shipDeploy variantp < 0.05 AND positive lift
keep_runningContinue experimentp < 0.3, trending positive
rejectDo not deployp < 0.05 AND negative lift
escalateNeeds human reviewNot conclusive or critical warnings

Python API

import sys
sys.path.insert(0, '~/clawd/agent-causal-decision-tool')

from src.ab_test import calculate_ab

result = calculate_ab({
    "control_conversions": 100,
    "control_total": 5000,
    "variant_conversions": 130,
    "variant_total": 5000
})

if result.recommendation.decision == "ship":
    # Deploy variant
    pass

Warnings & Limitations

  • LOW_TRAFFIC: Sample size below 1000 per group
  • SMALL_EFFECT: Lift < 1%, may not be practically significant
  • AGGREGATE_DATA: DiD performed on aggregated data (use individual-level data for robust inference)
  • TRENDS_DIVERGE: DiD parallel trends assumption may not hold

Location

Dependencies

  • Python 3.9+
  • click >= 8.1.0
  • scipy >= 1.11.0
  • numpy >= 1.24.0
  • pydantic >= 2.0.0

Version tags

latestvk975ntegce7farpc0pp3t7vf2x85rkp1

Runtime requirements

Binspython3, git, pip