Self-Discovery Reasoning

Automation

Automatically detects runtime capabilities to self-compose tailored multi-step reasoning structures for complex, multi-module problem solving and analysis ta...

Install

openclaw skills install self-discover-reasoning

Self-Discover Reasoning Skill

An AI agent skill for self-composing task-specific reasoning structures. Based on Zhou et al. (2024) — SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures.

Platform Auto-Detection

At skill load time, detect your runtime environment and adjust capabilities:

CapabilityHow to CheckFallback
File systemCan you read references/discovery-templates.md?Use the inline templates below instead
Persistent memoryCan you write to memory/?Store discovery notes in conversation context only
Long contextIs your context window > 32K tokens?Cap at Level 2 (skip full discovery)
Tool accessCan you call external tools?Use mental reasoning only

Detection rules:

  • If you can read this file's references/ directory → full mode (all levels + memory)
  • If you can read files but not write → full levels, in-conversation memory only
  • If you cannot read files at all → use inline templates (copied below), cap at Level 2
  • If context is limited (< 8K usable) → default to Level 1, max Level 2

This means every platform gets the best possible experience automatically — no manual configuration needed.

Inline Discovery Templates (for environments without file access)

If you cannot read references/discovery-templates.md, use these directly:

Level 1 Internal Prompt

Before answering: (1) What type of reasoning does this task need? (2) Pick 1-2 modules from the seed list. (3) Apply them. Deliver.

Level 2 Internal Prompt

1. SELECT: From the seed modules, choose 3-5 relevant to this task.
2. ADAPT: Rephrase each selected module to be task-specific.
3. IMPLEMENT: Compose into a step-by-step reasoning structure.
4. Follow the structure to produce the answer.

Level 3 Internal Prompt

1. SELECT: Choose 4-7 modules relevant to the task type.
2. ADAPT: Tailor each to the specific task, adding domain expertise.
3. IMPLEMENT: Build a JSON-like reasoning structure with keys and expected outputs.
4. Execute the structure step-by-step.
5. VERIFY: Check the answer against the structure — did every key get a valid value?
6. If gaps found, refine the structure and re-execute.

When to Activate

Activate when you are about to solve a reasoning-intensive task — after gathering all information but before producing your answer. Discovery happens before reasoning, not instead of work.

Strong triggers: Multi-step reasoning, math, logic puzzles, planning, debugging, architecture decisions, analytical tasks.

Skip: Simple factual lookups, greetings, formatting requests, single-sentence answers.

Core Process: SELECT → ADAPT → IMPLEMENT → SOLVE

1. SELECT    — Choose relevant reasoning modules from the seed set
2. ADAPT     — Rephrase selected modules to be task-specific
3. IMPLEMENT — Compose modules into a structured reasoning plan (key-value format)
4. SOLVE     — Follow the reasoning structure to produce the final answer

Source: Zhou et al., "Self-Discover: Large Language Models Self-Compose Reasoning Structures" (2024, ICML) — LLMs self-compose atomic reasoning modules into task-intrinsic structures, achieving up to 32% improvement over Chain-of-Thought.


Seed Reasoning Modules

These are atomic reasoning skills drawn from cognitive science and prompting research (Fernando et al., 2023; Zhou et al., 2024). The agent selects a subset relevant to each task.

#ModuleDescriptionBest For
1Step-by-Step ThinkingBreak reasoning into sequential stepsSequential tasks, procedures
2DecompositionBreak problem into sub-problems, solve eachComplex multi-part problems
3Critical ThinkingAnalyze from multiple perspectives, question assumptionsEvaluating claims, decisions
4Reflective ThinkingSearch for first principles, examine underlying theoryScience, deep analysis
5Creative ThinkingGenerate novel approaches, brainstorm alternativesDesign, open-ended tasks
6Pattern RecognitionIdentify recurring patterns and regularitiesData analysis, sequences
7Analogical ReasoningMap solutions from similar known problemsTransfer learning, explanations
8Causal AnalysisIdentify cause-effect relationshipsDebugging, diagnostics
9Constraint SatisfactionCheck solutions against all constraintsOptimization, planning
10AbstractionExtract general principles from specificsGeneralization, architecture
11Hypothesis TestingForm and test hypotheses systematicallyTroubleshooting, debugging
12Comparative AnalysisCompare options against defined criteriaDecision-making, trade-offs
13Temporal ReasoningReason about sequences and time dependenciesScheduling, process flows
14Spatial ReasoningReason about spatial relationshipsLayout, geometry, UI
15Counterfactual ThinkingConsider "what if" alternativesRisk analysis, planning
16Lateral ThinkingApproach from unconventional anglesInnovation, problem-solving
17Deductive ReasoningApply general rules to specific casesLogic, validation
18Inductive ReasoningInfer general rules from specific examplesPattern generalization
19Probabilistic ThinkingReason under uncertainty with probabilitiesRisk assessment, forecasting
20Systems ThinkingConsider interactions between componentsArchitecture, complex systems

Discovery Depth Levels

The depth is determined by task complexity, not user preference. The agent auto-selects.

Level 0: Direct Answer (Skip discovery)

Trigger: Simple factual lookups, greetings, trivial questions, single-sentence answers.

Action: Do nothing extra. Just respond. Cost: 0 additional tokens.

Examples: "What time is it?", "Thanks", simple formatting requests.

Level 1: Single-Module Reasoning

Trigger: Medium-complexity tasks — explanations, how-to guides, code snippets, questions requiring one type of reasoning.

Action: Select 1-2 relevant modules. Apply mentally. Deliver.

Budget: 1 discovery pass. ~10% overhead on response tokens.

Internal process:

After receiving the task:
- What type of reasoning does this need?
- Pick the most relevant module (e.g., "decomposition" for multi-part questions)
- Apply it while composing the answer
- Deliver

Level 2: Multi-Module Composition

Trigger: Complex tasks — technical architectures, multi-step plans, debugging, anything requiring 3+ reasoning steps or multiple perspectives.

Action: Select 3-5 modules. Adapt to task. Compose into structured plan. Execute.

Budget: 1 full discovery cycle (SELECT + ADAPT + IMPLEMENT). ~25% overhead.

Internal process:

1. SELECT 3-5 relevant modules from the seed set
2. ADAPT each to be specific to this task
3. IMPLEMENT into a step-by-step structure
4. Follow the structure to produce the answer

Level 3: Full Self-Discovery

Trigger: High-stakes tasks — complex math, logic puzzles, production architecture, multi-constraint optimization, or user explicitly requests thorough reasoning.

Action: Full SELECT → ADAPT → IMPLEMENT cycle with 4-7 modules, JSON-like reasoning structure, and verification pass.

Budget: Full discovery cycle + verification. ~40% overhead.

Internal process:

1. SELECT 4-7 relevant modules
2. ADAPT with domain-specific tailoring
3. IMPLEMENT into key-value reasoning structure:
   {
     "step_1": { "action": "...", "expected_output": "..." },
     "step_2": { "action": "...", "expected_output": "..." },
     ...
   }
4. Execute each step, filling in values
5. VERIFY: Did every step produce a valid output?
6. If gaps → refine structure, re-execute gap steps only

Pre-Built Discovery Structures (Quick Templates)

For common task types, use these pre-composed structures instead of running full discovery:

Coding / Debugging

{
  "step_1": { "action": "Reproduce: Identify the exact error and trigger condition", "output": "error description" },
  "step_2": { "action": "Decompose: Break the code path into segments", "output": "list of code segments" },
  "step_3": { "action": "Hypothesis: Form 2-3 hypotheses for root cause", "output": "ranked hypotheses" },
  "step_4": { "action": "Test: Verify top hypothesis against evidence", "output": "confirmed root cause" },
  "step_5": { "action": "Fix: Implement the fix with edge case handling", "output": "corrected code" }
}

Architecture / Decision

{
  "step_1": { "action": "Decompose: List all requirements and constraints", "output": "requirement matrix" },
  "step_2": { "action": "Compare: Generate 2-3 options", "output": "option summaries" },
  "step_3": { "action": "Evaluate: Score each option against constraints", "output": "comparison table" },
  "step_4": { "action": "Decide: Select best option with justification", "output": "decision + rationale" },
  "step_5": { "action": "Validate: Check for overlooked constraints", "output": "final recommendation" }
}

Math / Logic

{
  "step_1": { "action": "Understand: Restate the problem, identify given and unknown", "output": "problem statement" },
  "step_2": { "action": "Plan: Select relevant formulas/approaches", "output": "solution strategy" },
  "step_3": { "action": "Execute: Apply step-by-step with intermediate results", "output": "workings" },
  "step_4": { "action": "Verify: Substitute answer back or check invariants", "output": "verification" }
}

Analysis / Research

{
  "step_1": { "action": "Scope: Define what needs to be analyzed and why", "output": "analysis scope" },
  "step_2": { "action": "Gather: Identify key facts, data points, or evidence", "output": "evidence list" },
  "step_3": { "action": "Pattern: Find patterns, trends, or anomalies", "output": "findings" },
  "step_4": { "action": "Synthesize: Combine findings into conclusions", "output": "conclusions" },
  "step_5": { "action": "Validate: Check conclusions against original scope", "output": "final analysis" }
}

Convergence Rules

Based on the empirical finding from Zhou et al. that self-discovered structures are effective in a single pass:

  1. Maximum 1 discovery cycle per task (SELECT → ADAPT → IMPLEMENT is one cycle).
  2. Verification pass (Level 3 only): If verification reveals gaps, refine structure once. Do not loop.
  3. Stop if the structure covers all aspects of the task.
  4. Diminishing returns rule: Discovery overhead should never exceed 40% of the response.

Anti-pattern — DO NOT:

  • Run discovery on every single message (use Level 0 for simple Q&A)
  • Over-compose modules (more ≠ better; 3-5 modules is optimal for most tasks)
  • Re-run discovery if the first answer is reasonable

Cost Control Strategy

DepthModulesMax StepsApprox. Token OverheadWhen to Use
Level 000+0%Simple Q&A
Level 11-2Mental~10%Most conversations
Level 23-5Structured plan~25%Complex technical
Level 34-7Full JSON + verify~40%High-stakes only

Principle: Discovery should cost less than the cost of a wrong approach. For low-stakes responses, skip discovery entirely.


Trigger Conditions Summary

Auto-Trigger (always on)

  • Task requires multi-step reasoning → at least Level 1
  • Task involves 3+ distinct reasoning types → at least Level 2
  • Task is math, logic, or planning → at least Level 1

Skip Discovery

  • User is in a hurry (explicit: "quick", "brief", "just tell me")
  • Response is under 2 sentences
  • Pure social/chat exchange
  • Simple factual lookup

User Manual Trigger

  • User says "think carefully", "reason through this" → Level 2+
  • User says "this is important", "critical", "production" → Level 3
  • User says "discover" or "self-discover" → Level 2+

Output Format

Discovery is internal — the user should not see the raw reasoning structure. However:

After using discovery, you MAY append a subtle note:

Level 1: No note (keep it invisible). Level 2: Optionally: _(response composed via multi-module reasoning)_ Level 3: Optionally: _(reasoning structure self-discovered from [N] modules)_

NEVER:

  • Show the raw JSON reasoning structure to the user
  • Make the note prominent or distracting
  • Add notes for Level 0 or Level 1 responses

Cross-Task Transfer (Structure Reuse)

Inspired by Zhou et al.'s finding that self-discovered structures transfer across model families — the same reasoning structure that works for GPT-4 also helps Llama-2.

When to reuse: When you encounter a task of the same type as one you've previously solved with discovery (e.g., another debugging task, another architecture decision).

How to reuse: Write the discovered structure to memory/discovered-structures.md:

## [Task Type]
Structure: [paste the key-value structure]
Effective for: [describe task characteristics]
Date: [today]

Next time you encounter a similar task, check memory first before running discovery again.


Relationship to Other Reasoning Techniques

TechniqueSourceHow Self-Discover Differs
Chain of ThoughtWei et al. 2022CoT is a single module; Self-Discover composes multiple
Least-to-MostZhou et al. 2022Decomposition only; Self-Discover selects best modules per task
Self-ConsistencyWang et al. 2022Requires 10-40x more compute; Self-Discover is more efficient
Tree of ThoughtsYao et al. 2023Explores multiple paths; Self-Discover composes optimal structure
Step-Back PromptingZheng et al. 2023Single principle; Self-Discover combines multiple heuristics
OPROYang et al. 2023Optimizes prompts with training data; Self-Discover needs no labels

Quick Reference Card

DISCOVER? ──→ Simple/trivial? ──→ NO → Just respond
    │
    YES
    │
    ├─ Single reasoning type → LEVEL 1 (1-2 modules, mental)
    ├─ Multi-step / multi-perspective → LEVEL 2 (3-5 modules, structured)
    └─ High-stakes / complex → LEVEL 3 (4-7 modules, full JSON + verify)

SELECT → ADAPT → IMPLEMENT → SOLVE
    │          │          │          │
    Pick      Tailor     Structure  Execute
    modules   to task    the plan   the plan