Scientific Inquiry

Other

Rigorous evidence-based inquiry: decompose fuzzy questions, retrieve & grade evidence (S/A/B/C/D), cross-validate, and output conclusions with confidence intervals. Includes Step 0 user confirmation to prevent direction drift.

Install

openclaw skills install scientific-inquiry

🧪 Scientific Inquiry

Security Notice: This skill uses self-modification (via skill_manage) but ONLY when the user explicitly commands it. See the "Controlled Self-Evolution" section for details. This prevents prompt injection and unintended auto-modification.

Trigger Conditions

Activate this skill when the user asks any of the following:

Fact-checking: "Is X true?" "Is X reliable?"
Data research: "What's the trend/data/distribution of X?" "Look up data on X"
Industry research: "How is market X doing?" "Analyze industry X"
Verification: "I heard X, does that check out?" "Can this conclusion hold?"
Comparison: "Which is better, X or Y?" "Compare X and Y"
User explicitly says: "research", "investigate", "verify", "look into", "analyze", "check"

Even simple requests (like "check this stat") activate this skill if they involve systematic information gathering.

Core Pipeline

Step 0: Problem Analysis → User Confirmation (Critical! Prevents Direction Drift)

Upon receiving a question, do NOT start searching yet. First output a research plan template:

📋 Research Plan

Question: [Restate the original question to confirm alignment]

Research type: Fact-check / Data research / Industry study / Comparison / Trend analysis

Sub-questions:

[Sub-question A] — Verifiability: High/Medium/Low → [Expected sources]

[Sub-question B] — Verifiability: High/Medium/Low → [Expected sources]

[Sub-question C] — Verifiability: High/Medium/Low → [Expected sources]

Methodology:

Primary search path: [Specific tools/APIs/databases]

Keywords: [Search terms]

Fallback if key data is unavailable: [Alternative approach]

Expected output:

Expected confidence: High/Medium/Low

Main uncertainties: [Anticipated blind spots]

✅ Does this direction look good? Let me know and I'll proceed with Step 1-4.

Do NOT make any retrieval tool calls until the user confirms.

Step 1: Decompose Into Sub-questions

Break the fuzzy question into verifiable atomic statements. For each:

Verifiability: High (public data/literature) / Medium (indirect evidence) / Low (little public info)
Evidence type: Quantitative (specific numbers) / Qualitative (trend judgment)
Source direction: E.g., academic papers, official data, industry reports, news articles

Step 1.5 (Critical Prerequisite): Verify Baseline Facts

Before formal research, check these prerequisites:

🔴 Time-Baseline Check

Search product/event + "launch" "announce" "release" — confirm if it already happened
High-risk categories: consumer electronics, policy changes, earnings calls, product releases
If results show the event has occurred, pivot immediately — don't keep analyzing based on old data

Classic failure mode: User asks "will Huawei phones get more expensive?" You analyze storage cost trends for 30 minutes. Meanwhile, the Pura 90 already launched with published pricing. You're predicting history.

🔴 Search Engine Diagnostic

Before committing to a search tool, quickly test availability:

Try web_search first — simple query, check if results come back normally
If web_search fails — use curl -sL to Google/Bing/DuckDuckGo; distinguish CAPTCHA from timeout
Three failure modes:
- CAPTCHA block (Google's "sorry" page / DuckDuckGo checkbox grid / Baidu slider) → switch search engine immediately
  - Do NOT retry the same engine more than 2 times
  - Try a different engine or use the video platform fallback (Step 2b)
- Timeout / empty page ((empty page) or ERR_TIMED_OUT) → network/proxy issue
  - First confirm basic connectivity with curl to a simple HTTP target
  - Bing's (empty page) sometimes resolves after pressing Enter/submitting the search form
- Login redirect (site search requiring auth) → abandon, use alternative sources
Choose fallback channel based on failure mode — see Step 2b below

This step prevents wasted calls on dead search channels. If all search engines are blocked, video platform titles + vertical media browsing is 10x more productive than retrying Google.

Step 2: Evidence Retrieval (Classified & Graded)

Every piece of evidence MUST be annotated with source and grade. See the "Evidence Classification Discipline" section for detailed definitions.

Prioritize S/A-grade evidence; B/C are supplementary only.

S-grade: Primary academic literature / Official statistics / Raw data APIs
A-grade: Authoritative media / Professional reports / Fully cited secondary sources
B-grade: Industry analysis / Forum discussions / Indirect data
C-grade: Social media / Single samples / Non-professional interpretations
D-grade: No source / Rumors / Obvious conflicts of interest

Present findings as an evidence table:

Evidence	Source	URL	Grade	Sub-question
...	...	...	...	...

Source URLs are mandatory. A bare site name (e.g., "YouTube") is not a valid source. Even search engine results should link to the search page or specific result.

Step 2b: Fallback Search Strategies

When mainstream search engines are blocked or return empty results:

1️⃣ Video platform search — YouTube (for pricing/product info), or local equivalents

Video titles often contain structured data (prices, specs, dates)
Multiple creator titles covering the same number → higher confidence
Upload date ≈ event date, accurate to the day
Comments and related recommendations can reveal additional intel
Search multiple keyword variants (product + price / product + launch / CEO + statement)

2️⃣ Direct access to vertical media

Tech news sites, industry publications
Note: some require login; try site-specific Google search syntax

3️⃣ E-commerce platforms

Official brand stores, marketplaces
Note: may redirect to login pages

4️⃣ Social media

Weibo, Twitter/X, Reddit — if accessible

5️⃣ Text-mode search engines

DuckDuckGo lite, Startpage
Note: may still trigger CAPTCHA

Priority: Video platform titles > Vertical media > E-commerce > Social media. Video title info density and timeliness often exceed other sources for consumer products.

Step 3: Cross-Validation

For each sub-question:

At least 2 independent sources
Label inter-evidence relationship: Consistent / Contradictory / Complementary
If contradictory, analyze possible causes (methodology differences / vested interests / time window / sample bias)

Step 4: Conclusion Output (✅/⚠️/❌ Symbol Format)

Two-block output:

Block A — Claim Verification Report (one line per key finding)

✅ CONFIRMED: 「Pura 90 starts at ¥4,699」→ 5 creator video titles agree + financial media report
⚠️ UNVERIFIABLE: 「Huawei stockpiled 100M NAND chips」→ single comment section post (D-grade), no media confirmation
❌ CONTRADICTED: 「Pura 90 will be more expensive than Pura 80」→ actual launch price ¥4,699, same as predecessor

Block B — Overall Judgment

Proposition: [One-sentence restatement]

Confidence:
  ✅ High (≥80%) — Multiple S/A-grade evidence consistent
  ⚠️ Medium (50-80%) — Key data gaps exist
  ❌ Low (<50%) — Mostly inference

Top-3 Key Evidence (with URLs):
  1. [Evidence A] — S-grade — [Source](URL)
  2. [Evidence B] — A-grade — [Source](URL)
  3. [Evidence C] — B-grade — [Source](URL)

Core Uncertainties:
  - [Uncertainty 1]
  - [Uncertainty 2]

Evidence Classification Discipline (Critical!)

Evidence grades are decoration — they are the LIFEBLOOD of your conclusion.

Grade Definitions

Grade	Definition	Examples	Usable?
S	Primary academic lit / Official stats / Raw data APIs / Authoritative market reports	Peer-reviewed papers, government statistics, exchange data	✅ Standalone
A	Respected media / Professional analysis / Fully cited secondary sources	Reuters, Bloomberg, financial analyst reports	✅ Needs ≥1 corroboration
B	Industry analysis / Forum discussions / Indirect data / Raw executive quotes	CEO statements (cross-verified across video titles), tech news	✅ Needs ≥2 cross-references
C	Social media / Single samples / Non-professional reading / Snippet from search results	Individual blog posts, Reddit answers, single YouTube title	⚠️ Leads only, cannot conclude
D	No source / Rumors / Obvious conflict of interest / User comment section	YouTube/Reddit comment section, anonymous forum posts	❌ Never use as evidence

Core Rules

Video titles = C-grade (weak lead starting point)
- Same data point confirmed in 3+ independent creator titles → upgrade to B
- Combined with professional media coverage → A-
Comment section user posts = D-grade (unreliable by default)
- Never cite as evidence, no matter how detailed or plausible!
- Use comment info only as "search suggestions" — take the keyword, find a real source
Source URLs are mandatory, not optional
- Every evidence item MUST include a full URL
- "Found on YouTube" is not a valid source
- Search engine result page URLs count if you label the search term
Better to say less than to fabricate
- When key data is missing, mark "pending collection" or "no reliable source found"
- Never fill gaps with D-grade material or assumed values
- Wrong conclusions should be DELETED entirely, not left as "to be verified"

Controlled Self-Evolution (方案B — Guarded Mode)

🔴 Security Constraint: This skill's self-modification is gated behind explicit user commands.

User provides feedback → default action: update memory only (no skill file change) User says "update the skill" / "commit this to the skill" / "add this to the workflow" → then execute skill_manage

This prevents: malicious input injection / accidental trigger during research / unconfirmed auto-modification

Recording Phase (Default Behavior)

When the user provides improvement feedback:

Store in memory first — memory(action='add', ...) records preferences and lessons
No automatic skill_manage calls, no SKILL.md modification

Upgrade Phase (Explicit User Command Required)

Only execute skill_manage(patch) when the user explicitly says:

"Update the skill"
"Add this to the skill"
"Commit this to the workflow / to common pitfalls"
"Add this to evidence grades / trigger conditions / search strategies"
Any phrase containing "update skill", "commit to skill", "save to skill"

Common trigger scenarios:

User feedback type	Record to memory	Upgrade to skill
Direction correction: "This sub-question isn't the point"	✅ Default	When user says "update the skill accordingly"
Evidence standard: "This source isn't good enough"	✅ Default	When user says "add this to the evidence discipline"
Format preference: "Too long / give me a short version first"	✅ Default	When user says "save this format to the skill"
New scenario: "This isn't just fact-checking, it's data research"	✅ Default	When user says "add this to trigger conditions"
Methodology: "You should plan before executing"	✅ Default	When user says "add this to the workflow"
Recurring error (≥2 same class)	✅ Default	When user says "add this to common pitfalls"

Scenario Types

Scenario	Characteristics	Watch Out For
Fact-check	Verify a specific claim	Find primary source, watch for telephone game distortions
Trend analysis	Predict direction of a metric	Separate short-term noise from long-term trends, note data window
Comparison	Compare options	Ensure full dimension coverage, avoid survivorship bias
Causal analysis	Did A cause B?	Distinguish correlation from causation, watch for confounders
Consumer pricing/product research	Product pricing and storage strategy	First verify if the product is already launched! Check executive statements; find raw component cost data from market research firms

Quality Checklist

Step 0 plan output and user confirmation received?
Each sub-question has ≥1 evidence source?
Every evidence item graded?
Contradictory evidence analyzed for probable cause?
Conclusion includes confidence level and uncertainties?
Discipline check: Any comment-section UGC cited as evidence? Source URLs complete? Any "to be verified" speculation left?

Common Pitfalls

Don't skip Step 0: Even if the direction seems obvious. Wrong direction × fast search = wasted time.
Don't search only for supporting evidence: Actively look for counter-arguments. Avoid confirmation bias.
Distinguish "no evidence" from "evidence against": Not finding something ≠ it doesn't exist. Label as "not found", not "disproven".
Watch data timeliness: Especially for prices and policies. Note the collection date.
Keep user updated during long searches: If retrieval exceeds 5 steps, report progress between steps. No silent running.
Verify product/event existence before predicting: The most common embarrassing mistake — predicting a "soon to launch" product that already launched.
Never cite comment-section UGC as evidence: Default grade D. Use comments only as search leads.
Distinguish "search result title" from "comment post": A YouTube/Reddit video title is C-grade (creator's public info). A comment on that video is D-grade. Different worlds.
Source URLs must be complete: Bare site names don't count. Search result page URLs with labeled search terms count.