Scientific Inquiry

Rigorous evidence-based inquiry: decompose fuzzy questions, retrieve & grade evidence (S/A/B/C/D), cross-validate, and output conclusions with confidence intervals. Includes Step 0 user confirmation to prevent direction drift.

Audits

Pass

Install

openclaw skills install scientific-inquiry

🧪 Scientific Inquiry

Security Notice: This skill uses self-modification (via skill_manage) but ONLY when the user explicitly commands it. See the "Controlled Self-Evolution" section for details. This prevents prompt injection and unintended auto-modification.

Trigger Conditions

Activate this skill when the user asks any of the following:

  • Fact-checking: "Is X true?" "Is X reliable?"
  • Data research: "What's the trend/data/distribution of X?" "Look up data on X"
  • Industry research: "How is market X doing?" "Analyze industry X"
  • Verification: "I heard X, does that check out?" "Can this conclusion hold?"
  • Comparison: "Which is better, X or Y?" "Compare X and Y"
  • User explicitly says: "research", "investigate", "verify", "look into", "analyze", "check"

Even simple requests (like "check this stat") activate this skill if they involve systematic information gathering.

Core Pipeline

Step 0: Problem Analysis → User Confirmation (Critical! Prevents Direction Drift)

Upon receiving a question, do NOT start searching yet. First output a research plan template:

📋 Research Plan

Question: [Restate the original question to confirm alignment]

Research type: Fact-check / Data research / Industry study / Comparison / Trend analysis

Sub-questions:

  1. [Sub-question A] — Verifiability: High/Medium/Low → [Expected sources]
  2. [Sub-question B] — Verifiability: High/Medium/Low → [Expected sources]
  3. [Sub-question C] — Verifiability: High/Medium/Low → [Expected sources]

Methodology:

  • Primary search path: [Specific tools/APIs/databases]
  • Keywords: [Search terms]
  • Fallback if key data is unavailable: [Alternative approach]

Expected output:

  • Expected confidence: High/Medium/Low
  • Main uncertainties: [Anticipated blind spots]

✅ Does this direction look good? Let me know and I'll proceed with Step 1-4.

Do NOT make any retrieval tool calls until the user confirms.

Step 1: Decompose Into Sub-questions

Break the fuzzy question into verifiable atomic statements. For each:

  • Verifiability: High (public data/literature) / Medium (indirect evidence) / Low (little public info)
  • Evidence type: Quantitative (specific numbers) / Qualitative (trend judgment)
  • Source direction: E.g., academic papers, official data, industry reports, news articles

Step 1.5 (Critical Prerequisite): Verify Baseline Facts

Before formal research, check these prerequisites:

🔴 Time-Baseline Check

  • Search product/event + "launch" "announce" "release" — confirm if it already happened
  • High-risk categories: consumer electronics, policy changes, earnings calls, product releases
  • If results show the event has occurred, pivot immediately — don't keep analyzing based on old data

Classic failure mode: User asks "will Huawei phones get more expensive?" You analyze storage cost trends for 30 minutes. Meanwhile, the Pura 90 already launched with published pricing. You're predicting history.

🔴 Search Engine Diagnostic

Before committing to a search tool, quickly test availability:

  1. Try web_search first — simple query, check if results come back normally
  2. If web_search fails — use curl -sL to Google/Bing/DuckDuckGo; distinguish CAPTCHA from timeout
  3. Three failure modes:
    • CAPTCHA block (Google's "sorry" page / DuckDuckGo checkbox grid / Baidu slider) → switch search engine immediately
      • Do NOT retry the same engine more than 2 times
      • Try a different engine or use the video platform fallback (Step 2b)
    • Timeout / empty page ((empty page) or ERR_TIMED_OUT) → network/proxy issue
      • First confirm basic connectivity with curl to a simple HTTP target
      • Bing's (empty page) sometimes resolves after pressing Enter/submitting the search form
    • Login redirect (site search requiring auth) → abandon, use alternative sources
  4. Choose fallback channel based on failure mode — see Step 2b below

This step prevents wasted calls on dead search channels. If all search engines are blocked, video platform titles + vertical media browsing is 10x more productive than retrying Google.

Step 2: Evidence Retrieval (Classified & Graded)

Every piece of evidence MUST be annotated with source and grade. See the "Evidence Classification Discipline" section for detailed definitions.

Prioritize S/A-grade evidence; B/C are supplementary only.

S-grade: Primary academic literature / Official statistics / Raw data APIs
A-grade: Authoritative media / Professional reports / Fully cited secondary sources
B-grade: Industry analysis / Forum discussions / Indirect data
C-grade: Social media / Single samples / Non-professional interpretations
D-grade: No source / Rumors / Obvious conflicts of interest

Present findings as an evidence table:

EvidenceSourceURLGradeSub-question
...............

Source URLs are mandatory. A bare site name (e.g., "YouTube") is not a valid source. Even search engine results should link to the search page or specific result.

Step 2b: Fallback Search Strategies

When mainstream search engines are blocked or return empty results:

1️⃣ Video platform search — YouTube (for pricing/product info), or local equivalents

  • Video titles often contain structured data (prices, specs, dates)
  • Multiple creator titles covering the same number → higher confidence
  • Upload date ≈ event date, accurate to the day
  • Comments and related recommendations can reveal additional intel
  • Search multiple keyword variants (product + price / product + launch / CEO + statement)

2️⃣ Direct access to vertical media

  • Tech news sites, industry publications
  • Note: some require login; try site-specific Google search syntax

3️⃣ E-commerce platforms

  • Official brand stores, marketplaces
  • Note: may redirect to login pages

4️⃣ Social media

  • Weibo, Twitter/X, Reddit — if accessible

5️⃣ Text-mode search engines

  • DuckDuckGo lite, Startpage
  • Note: may still trigger CAPTCHA

Priority: Video platform titles > Vertical media > E-commerce > Social media. Video title info density and timeliness often exceed other sources for consumer products.

Step 3: Cross-Validation

For each sub-question:

  • At least 2 independent sources
  • Label inter-evidence relationship: Consistent / Contradictory / Complementary
  • If contradictory, analyze possible causes (methodology differences / vested interests / time window / sample bias)

Step 4: Conclusion Output (✅/⚠️/❌ Symbol Format)

Two-block output:

Block A — Claim Verification Report (one line per key finding)

✅ CONFIRMED: 「Pura 90 starts at ¥4,699」→ 5 creator video titles agree + financial media report
⚠️ UNVERIFIABLE: 「Huawei stockpiled 100M NAND chips」→ single comment section post (D-grade), no media confirmation
❌ CONTRADICTED: 「Pura 90 will be more expensive than Pura 80」→ actual launch price ¥4,699, same as predecessor

Block B — Overall Judgment

Proposition: [One-sentence restatement]

Confidence:
  ✅ High (≥80%) — Multiple S/A-grade evidence consistent
  ⚠️ Medium (50-80%) — Key data gaps exist
  ❌ Low (<50%) — Mostly inference

Top-3 Key Evidence (with URLs):
  1. [Evidence A] — S-grade — [Source](URL)
  2. [Evidence B] — A-grade — [Source](URL)
  3. [Evidence C] — B-grade — [Source](URL)

Core Uncertainties:
  - [Uncertainty 1]
  - [Uncertainty 2]

Evidence Classification Discipline (Critical!)

Evidence grades are decoration — they are the LIFEBLOOD of your conclusion.

Grade Definitions

GradeDefinitionExamplesUsable?
SPrimary academic lit / Official stats / Raw data APIs / Authoritative market reportsPeer-reviewed papers, government statistics, exchange data✅ Standalone
ARespected media / Professional analysis / Fully cited secondary sourcesReuters, Bloomberg, financial analyst reports✅ Needs ≥1 corroboration
BIndustry analysis / Forum discussions / Indirect data / Raw executive quotesCEO statements (cross-verified across video titles), tech news✅ Needs ≥2 cross-references
CSocial media / Single samples / Non-professional reading / Snippet from search resultsIndividual blog posts, Reddit answers, single YouTube title⚠️ Leads only, cannot conclude
DNo source / Rumors / Obvious conflict of interest / User comment sectionYouTube/Reddit comment section, anonymous forum postsNever use as evidence

Core Rules

  1. Video titles = C-grade (weak lead starting point)

    • Same data point confirmed in 3+ independent creator titles → upgrade to B
    • Combined with professional media coverage → A-
  2. Comment section user posts = D-grade (unreliable by default)

    • Never cite as evidence, no matter how detailed or plausible!
    • Use comment info only as "search suggestions" — take the keyword, find a real source
  3. Source URLs are mandatory, not optional

    • Every evidence item MUST include a full URL
    • "Found on YouTube" is not a valid source
    • Search engine result page URLs count if you label the search term
  4. Better to say less than to fabricate

    • When key data is missing, mark "pending collection" or "no reliable source found"
    • Never fill gaps with D-grade material or assumed values
    • Wrong conclusions should be DELETED entirely, not left as "to be verified"

Controlled Self-Evolution (方案B — Guarded Mode)

🔴 Security Constraint: This skill's self-modification is gated behind explicit user commands.

User provides feedback → default action: update memory only (no skill file change) User says "update the skill" / "commit this to the skill" / "add this to the workflow" → then execute skill_manage

This prevents: malicious input injection / accidental trigger during research / unconfirmed auto-modification

Recording Phase (Default Behavior)

When the user provides improvement feedback:

  1. Store in memory firstmemory(action='add', ...) records preferences and lessons
  2. No automatic skill_manage calls, no SKILL.md modification

Upgrade Phase (Explicit User Command Required)

Only execute skill_manage(patch) when the user explicitly says:

  • "Update the skill"
  • "Add this to the skill"
  • "Commit this to the workflow / to common pitfalls"
  • "Add this to evidence grades / trigger conditions / search strategies"
  • Any phrase containing "update skill", "commit to skill", "save to skill"

Common trigger scenarios:

User feedback typeRecord to memoryUpgrade to skill
Direction correction: "This sub-question isn't the point"✅ DefaultWhen user says "update the skill accordingly"
Evidence standard: "This source isn't good enough"✅ DefaultWhen user says "add this to the evidence discipline"
Format preference: "Too long / give me a short version first"✅ DefaultWhen user says "save this format to the skill"
New scenario: "This isn't just fact-checking, it's data research"✅ DefaultWhen user says "add this to trigger conditions"
Methodology: "You should plan before executing"✅ DefaultWhen user says "add this to the workflow"
Recurring error (≥2 same class)✅ DefaultWhen user says "add this to common pitfalls"

Scenario Types

ScenarioCharacteristicsWatch Out For
Fact-checkVerify a specific claimFind primary source, watch for telephone game distortions
Trend analysisPredict direction of a metricSeparate short-term noise from long-term trends, note data window
ComparisonCompare optionsEnsure full dimension coverage, avoid survivorship bias
Causal analysisDid A cause B?Distinguish correlation from causation, watch for confounders
Consumer pricing/product researchProduct pricing and storage strategyFirst verify if the product is already launched! Check executive statements; find raw component cost data from market research firms

Quality Checklist

  • Step 0 plan output and user confirmation received?
  • Each sub-question has ≥1 evidence source?
  • Every evidence item graded?
  • Contradictory evidence analyzed for probable cause?
  • Conclusion includes confidence level and uncertainties?
  • Discipline check: Any comment-section UGC cited as evidence? Source URLs complete? Any "to be verified" speculation left?

Common Pitfalls

  • Don't skip Step 0: Even if the direction seems obvious. Wrong direction × fast search = wasted time.
  • Don't search only for supporting evidence: Actively look for counter-arguments. Avoid confirmation bias.
  • Distinguish "no evidence" from "evidence against": Not finding something ≠ it doesn't exist. Label as "not found", not "disproven".
  • Watch data timeliness: Especially for prices and policies. Note the collection date.
  • Keep user updated during long searches: If retrieval exceeds 5 steps, report progress between steps. No silent running.
  • Verify product/event existence before predicting: The most common embarrassing mistake — predicting a "soon to launch" product that already launched.
  • Never cite comment-section UGC as evidence: Default grade D. Use comments only as search leads.
  • Distinguish "search result title" from "comment post": A YouTube/Reddit video title is C-grade (creator's public info). A comment on that video is D-grade. Different worlds.
  • Source URLs must be complete: Bare site names don't count. Search result page URLs with labeled search terms count.