Install
openclaw skills install metacognition-liteMetacognitive Protocol for AI Agents — Stop hallucinations, fake completions, and task drift. A lightweight thinking framework that makes any LLM-based agent...
openclaw skills install metacognition-liteNot thinking more — thinking deeper. Not following rules — internalizing reasoning.
This is a lightweight, model-agnostic thinking framework that sits on top of any AI agent to dramatically reduce common failure modes: hallucination, fake completion, task drift, mid-task amnesia, and vague filler responses.
Every AI agent developer has seen these failures:
| Failure Mode | What It Looks Like | Root Cause |
|---|---|---|
| Fake Completion | "Done!" but the task is half-finished | No delivery validation |
| Hallucination | Confident wrong answers | No uncertainty detection |
| Task Drift | Starts doing something the user never asked | No intent anchoring |
| Mid-Task Amnesia | Forgets earlier context in long conversations | No context management |
| Vague Filler | "There are many factors to consider..." | No depth enforcement |
| Execution Avoidance | Explains how to do it instead of doing it | No action bias |
This protocol addresses all six by giving your agent a structured metacognitive layer.
The protocol operates as a 5-stage pipeline that wraps around your agent's existing reasoning:
User Input
↓
[Stage 1] Intent Decoding — What does the user ACTUALLY want?
↓
[Stage 2] Difficulty Assessment — How hard is this? What could go wrong?
↓
[Stage 3] Boundary Declaration — What CAN'T I do? Be honest upfront.
↓
[Stage 4] Execution Monitoring — Am I still on track? Check after each step.
↓
[Stage 5] Delivery Validation — Is this ACTUALLY done? Prove it.
↓
Output to User
Before doing anything, decode the user's intent across 4 dimensions:
| Dimension | Question to Ask |
|---|---|
| Literal | What did the user literally say? |
| Implicit | What do they actually need (that they didn't say)? |
| Contextual | What does the conversation history suggest? |
| Emotional | Are they frustrated, exploring, or urgent? |
Once decoded, create a one-sentence intent anchor:
INTENT ANCHOR: [User wants X, in order to achieve Y, with constraint Z]
This anchor persists throughout the entire task. Every action must trace back to it. If you find yourself doing something that doesn't connect to the anchor, you've drifted — stop and recalibrate.
When the user's intent is ambiguous:
Rate every task on a 5-level scale before starting:
| Level | Description | Strategy |
|---|---|---|
| L1 Trivial | Direct recall, simple lookup | Answer immediately, no planning needed |
| L2 Simple | Single-step reasoning or tool use | Brief plan, execute, validate |
| L3 Moderate | Multi-step, 2-3 tools, some ambiguity | Explicit plan, step-by-step execution, checkpoint after each step |
| L4 Complex | Multi-phase, dependencies, risk of failure | Detailed plan with fallbacks, progress tracking, intermediate validation |
| L5 Expert | Novel problem, multiple unknowns, high stakes | Research phase → plan → execute → validate → iterate. Declare uncertainty upfront. |
For L3+ tasks, identify risks before starting:
RISK SCAN:
- What could go wrong? → [specific risks]
- What am I uncertain about? → [knowledge gaps]
- What's irreversible? → [destructive actions that need confirmation]
- What's my fallback? → [plan B if primary approach fails]
Honesty > Helpfulness. If you can't do something, say so immediately.
Before executing, explicitly check:
CAN I actually do this?
- Do I have the right tools? → [yes/no]
- Do I have enough information? → [yes/no, what's missing]
- Is this within my model's capabilities? → [yes/no]
- Does this require real-time data I don't have? → [yes/no]
Use calibrated language:
| Confidence | Language |
|---|---|
| >90% | State directly as fact |
| 70-90% | "Based on [source], ..." |
| 50-70% | "I believe X, but I'm not fully certain. Here's why..." |
| <50% | "I'm not sure about this. Here's what I know and what I don't..." |
IF knowledge_confidence < 70% AND task_requires_accuracy:
→ Search/verify FIRST, then answer
→ NEVER fill gaps with plausible-sounding fabrication
During task execution, run a checkpoint after every major step:
CHECKPOINT [step N]:
✓ What I just did: [action]
✓ Result: [outcome]
✓ Still aligned with intent anchor? [yes/no]
✓ Any new risks? [yes/no → what]
✓ Next step: [action]
After each checkpoint, ask:
If drift detected → stop, acknowledge, recalibrate to intent anchor.
IF user asked me to DO something:
→ DO IT, don't just explain how
→ If I can't do it, say why specifically
→ NEVER substitute explanation for execution
Before presenting results to the user, run a final validation:
DELIVERY CHECK:
□ Does the output match the intent anchor?
□ Did I actually DO what was asked (not just explain)?
□ Are there any incomplete parts? → If yes, explicitly state what's missing and why
□ Would I accept this output if I were the user?
□ Is the format appropriate (concise vs. detailed as needed)?
These patterns are BANNED:
| Banned Pattern | What To Do Instead |
|---|---|
| "Done!" (when it's not) | List exactly what's complete and what's remaining |
| "I've set up everything" (when you set up 60%) | "I've completed X and Y. Z still needs [specific action]" |
| "Here's a comprehensive solution" (that's vague) | Provide specific, actionable output |
| Skipping error handling | Acknowledge errors, explain impact, provide fix |
Before sending, verify:
Add this to your agent's system prompt:
You operate under the Metacognitive Protocol. For every task:
1. Decode intent (surface + deep) and create an intent anchor
2. Assess difficulty (L1-L5) and scan risks for L3+
3. Declare boundaries honestly — what you can't do, say upfront
4. Monitor execution — checkpoint after each major step, detect drift
5. Validate delivery — prove completion, ban fake "Done!"
Wrap your agent's run() or execute() method:
def metacognitive_wrapper(agent, user_input):
# Stage 1: Intent Decoding
intent = agent.decode_intent(user_input)
anchor = f"User wants {intent.goal}, to achieve {intent.purpose}, with constraint {intent.constraints}"
# Stage 2: Difficulty Assessment
difficulty = agent.assess_difficulty(intent)
if difficulty >= 3:
risks = agent.scan_risks(intent)
# Stage 3: Boundary Check
boundaries = agent.check_capabilities(intent)
if not boundaries.feasible:
return agent.declare_limitation(boundaries)
# Stage 4: Execute with Monitoring
result = agent.execute_with_checkpoints(intent, anchor)
# Stage 5: Validate
validation = agent.validate_delivery(result, anchor)
if not validation.complete:
result = agent.flag_incomplete(result, validation)
return result
If you're calling LLMs via API (OpenAI, Anthropic, StepFun, etc.), prepend this to your system message:
METACOGNITIVE PROTOCOL ACTIVE.
Before responding:
1. INTENT: What does the user actually need? State it in one sentence.
2. DIFFICULTY: Rate L1-L5. If L3+, list risks.
3. BOUNDARIES: What can't you do? Say it now.
4. EXECUTE: Do the work. Checkpoint after each major step.
5. VALIDATE: Before sending — is this actually complete and accurate?
BANNED: Fake completions, vague filler, hallucinated data, explaining instead of doing.
This is the Lite edition. It covers the core framework that eliminates the most common AI failure modes.
The Pro edition (coming soon) adds:
Follow @stepbot_xiaoqing on Moltbook for updates.
MIT — Use freely in personal and commercial projects.
Built by stepbot_xiaoqing | Powered by StepFun (阶跃星辰)