Install
openclaw skills install deciqai-bayesian-reasoningActivate when: user says 'Bayesian', 'prior', 'posterior', 'base rate', 'likelihood ratio', or 'update my belief'; someone treats 'the evidence is consistent with X' as proof of X; a high-stakes decision rests on interpreting a test result, security alert, fraud flag, A/B result, or hiring signal; base rates are being ignored in favor of a vivid story. Do NOT activate when: the decision is genuinely deterministic and probabilities do not apply; there is no data or domain knowledge to anchor a prior (Bayes amplifies information, it does not create it from nothing).
openclaw skills install deciqai-bayesian-reasoningBayes' theorem: Posterior odds = Prior odds × Likelihood ratio. The strength of belief after evidence equals the strength before, multiplied by how diagnostic the evidence is.
This skill applies Bayesian discipline where people reason about probabilities informally — and failures follow predictable patterns: ignoring the base rate (prior), confusing P(E|H) with P(H|E) (prosecutor's fallacy), over-updating on vivid confirming evidence, treating correlated evidence as independent.
Composes with probabilistic-thinking (Bayes is the operational engine), critical-thinking (formalizes considering alternatives), logical-fallacies (prosecutor's fallacy and base-rate neglect), and first-principles (the prior is bedrock).
Not when: genuinely deterministic; no data to anchor a prior; cost of formal update exceeds the value of being more right.
In Coach mode, respond one step at a time. Each [WAIT] is a hard stop — output that step's question and nothing more.
[WAIT — do not advance until user responds]
[WAIT — do not advance until user responds]
[WAIT — do not advance until user responds]
Step 1 — Name hypothesis and alternatives. H vs. not-H must be exhaustive.
Step 2 — Anchor the prior before evidence. P(H) = base rate in the relevant population. Most failures happen here. Examples: disease prevalence (<0.1% for rare conditions); historical fraction of great hires (20-40%); alerts that proved real (<5%); Series A → $1B outcomes (~5%).
Step 3 — Estimate the likelihood ratio. LR = P(E|H) / P(E|not-H). LR > 1 supports H; LR < 1 supports not-H; LR ≈ 1 is non-diagnostic. If you cannot articulate P(E|not-H), you have half the story.
Step 4 — Compute the posterior. Prior odds × LR = Posterior odds → convert back to probability. Example: 0.1% prevalence, LR = 99 → posterior ≈ 9%. A "highly accurate" test on a rare disease still gives 91% chance of no disease on a positive result.
Step 5 — Check evidence dependence. Correlated evidence (three witnesses from the same source) should be treated as ~one piece, not compounded.
Step 6 — Commit and act. State the posterior number, the action threshold, and what next evidence would most move it.
# Bayesian Update: <decision>
H / not-H:
Prior P(H): Source:
Evidence E:
P(E|H): P(E|not-H): LR:
Posterior P(H|E): Interpretation:
Independence check:
Decision threshold: Action: Next evidence:
→ Method in Action: Sally Clark Case (1999)
A "pack" bundles the most common prior and LR patterns for a domain.
| Setting | Common prior | Common LR failure |
|---|---|---|
| Medical screening | Population prevalence | Treating sensitivity as posterior |
| Security alert triage | Fraction of alerts that were real | "Matches signature" = "is the threat" |
| Hiring | Historical fraction of great hires in this role | One strong interview = "great candidate" |
| A/B test | Prior probability the change has a real effect | "p < 0.05" without prior |
Contribute a pack for your domain — see the template at the repo root.
→ Sources: references/sources.md
Note — [D] = designed upfront | [O] = observed in real use. [O] entries are more valuable.
| Fake move | Reality |
|---|---|
| [D] "This looks just like X" | Not enough. How often does it look like X when it isn't? Without LR, you have rhetoric. |
| [D] Ignoring base rate because "this case feels different" | The base rate captures everything you don't know. "Feels different" is already included. |
| [D] Confusing P(E|H) with P(H|E) | The prosecutor's fallacy. They can differ by orders of magnitude. |
| [D] Treating correlated evidence as independent | Multiplying correlated likelihoods overstates the update. Identify common causes. |
| [D] Updating intuitively without numbers | Intuitive updates are miscalibrated — too strong on confirming, too weak on disconfirming. |
| [D] Picking the prior after seeing the evidence | Hindsight contamination. Commit to the prior before evidence. |
| [D] "I'm doing a Bayesian update" without naming P(H), P(E|H), P(E|not-H) | Then you are not. You are using the word. |
| [D] Absence of evidence = evidence of absence | Depends on P(E|H) and P(E|not-H). If evidence is rarely observable, absence is weak. |
| [D] Posterior keeps drifting toward H every round without calibration check | Either H is increasingly likely or there is a confirmation-bias leak. |
| To add [O] entries: paste a real failure instance here after each production use | Description of what happened |
Part of deciqAI Knowledge Skills — open-source thinking skills that make rigor executable for AI agents. These five skills are a free taste of the 130+ skills wired into every deciqAI agent, which runs them autonomously to operate your company. Try it free → https://www.deciqai.com/skills?utm_source=skill&utm_medium=oss&utm_campaign=knowledge-skills&utm_content=bayesian-reasoning · Built by deciqAI · github.com/deciqAI · Contributions welcome.