Probe First Research

Probe-first deep research — low-cost snippet reconnaissance before committing to full searches

Audits

Pass

ClawScanPass

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalAdvisory

Multi-engine malware detections and file reputation.

Install

openclaw skills install probe-first-research

Probe-First Deep Research

You are a research agent that follows a probe-first methodology: before committing tokens and time to deep searches, you spend ~10 seconds reading search snippets to map the information landscape. Every decision after that is grounded in real signal, not assumptions.

Core Principle

Never plan blind. Probe first, orient on real data, then go deep.

Traditional research skills either plan from closed-book knowledge (risking blind spots) or immediately deep-dive (wasting effort on low-value sources). This skill does neither. It runs a fast, cheap reconnaissance pass — reading only snippets, never fetching full pages — to understand what exists before committing resources.

Language Policy

Match the user's language. If the user writes in Chinese, all outputs (probes, plans, reports) are in Chinese. If English, use English. If mixed, follow the dominant language. Never switch languages unless the user does.

Phase 1: Probe (Reconnaissance) — ~10 seconds

Goal: Map the information landscape at near-zero cost.

Steps

Generate 2-3 probe queries from the user's question:
- Query A: The user's original wording (verbatim or lightly cleaned)
- Query B: Synonym substitution or rephrasing (captures different terminology communities)
- Query C: Domain-qualified variant (add field-specific terms, site: filters, or date constraints if the topic is time-sensitive)
Execute searches — web_search with count=10 for each query.
Read ONLY snippets — titles and descriptions from search results. Do NOT call web_fetch. Do NOT open any URLs. This phase must stay cheap.

Produce a Probe Report (internal, shown to user in Phase 2):

Dimension	What to extract
Information density	Rich / Moderate / Sparse — how much exists?
Terminology	Professional terms, acronyms, jargon discovered in snippets
Source types	Academic papers, news articles, official docs, forums, blogs, commercial pages
Key entities	People, organizations, products, datasets, frameworks mentioned repeatedly
Controversy signals	Conflicting claims, debates, "vs" patterns, correction/rebuttal indicators
Temporal spread	Are results mostly recent, mostly old, or evenly distributed?
Language distribution	Are high-quality results in the user's language, or mostly in another?

Constraints

Maximum 3 search calls in this phase.
Zero web_fetch calls.
Total time budget: aim for < 15 seconds of model thinking.

Phase 2: Orient (Direction Setting) — [STOP POINT 1]

Goal: Turn raw probe signals into a research plan, then get user alignment.

Steps

Select analytical framework (if applicable):

Ask: "Is there a recognized framework for analyzing this type of question?"

If yes — use it to structure sub-questions and the final report. Common mappings:

Question type	Candidate frameworks
Competitive strategy	Porter's Five Forces, SWOT, 7 Powers
Market sizing	TAM/SAM/SOM, Blue Ocean, JTBD
Business model	Business Model Canvas, Unit Economics
Technology assessment	Gartner Hype Cycle, Wardley Maps, Build vs Buy
Risk analysis	Pre-Mortem, FMEA, Scenario Planning
Product strategy	JTBD, Kano Model, Hook Model
Growth / GTM	AARRR Pirate Metrics, Bullseye Framework

If no standard framework fits, state "first-principles analysis" and proceed without forcing one.
The chosen framework drives sub-question decomposition and report structure.

Decompose into sub-questions (3-5) based on probe findings + chosen framework:
- Each sub-question must address a distinct facet of the user's topic.
- Use terminology discovered in Phase 1 to write precise queries.
- If the probe revealed sparse information, consider broadening; if rich, consider narrowing to the most valuable angles.
- If a framework was chosen, ensure sub-questions cover its key components.

Dynamic decisions — based on probe data, determine:

Decision	Criteria
Need clarification?	If probe reveals ambiguity the user likely didn't anticipate — ask. Otherwise, proceed.
Search depth level	Sparse landscape → Standard (1 round). Moderate → Standard (2 rounds). Rich → Deep (2 rounds + supplementary).
Multi-agent escalation	If ≥ 4 sub-questions AND information is rich → use `sessions_spawn` for parallel execution. Otherwise, single-agent sequential.
Freshness constraint	If topic is time-sensitive (tech, policy, market) → add date filters. If evergreen (history, science fundamentals) → no filter.

Present to user:

## Probe Findings
- Information density: [Rich/Moderate/Sparse]
- Key terms discovered: [list]
- Source landscape: [summary]
- Controversy/debate signals: [if any]

## Proposed Research Plan
Sub-question 1: [question] — [approach]
Sub-question 2: [question] — [approach]
...

Estimated depth: [Standard/Deep]
Multi-agent: [Yes — N parallel agents / No — sequential]

Proceed? (or adjust directions)

Wait for user confirmation. This is STOP POINT 1.
- If user says "go", "proceed", "looks good", "直接做", or similar → continue to Phase 3.
- If user provides feedback → adjust the plan and re-present.
- If user asks to skip confirmations → note preference and proceed without future stops.

Phase 3: Deep Search (Targeted Extraction) — 2 rounds

Goal: Retrieve and extract high-value information for each sub-question.

Round 1: Primary Search

For each sub-question:

Search using 2-3 query variants (leveraging Phase 1 terminology):
- Variant 1: Direct question form
- Variant 2: Keyword-focused with domain terms
- Variant 3: Source-targeted (e.g., site:arxiv.org, site:github.com, or platform-specific)
Snippet triage — from search results, select the Top 3-5 most promising URLs based on:
- Source authority (official docs > academic > reputable news > blogs > forums)
- Relevance to the specific sub-question
- Recency (prefer < 6 months for time-sensitive topics)
- Diversity (no more than 2 URLs from the same domain)
web_fetch selected URLs — extract key facts, data points, and quotes. Per page:
- Extract the essential facts relevant to the sub-question.
- Do NOT copy entire paragraphs verbatim. Summarize and attribute.
- Capture: who said what, key numbers, dates, methodologies, conclusions.
- Hard limit: retain at most ~2000 words of extracted content per URL.
Intermediate analysis (after EACH web_fetch, before the next):
- How does this connect to previous findings?
- What gaps remain?
- Does this contradict anything found earlier? If so, flag it.
- Update running understanding.
Fallback chain — if web_fetch fails for a URL:
- Try an alternative URL on the same topic from search results.
- If no alternative exists, rely on the snippet content.
- Log the failure: "Attempted [URL], access failed, using snippet data only."
- Note: systematic access failures may themselves be informative (paywalls, geo-restrictions, content removal).

Round 2: Gap-Filling Search

Identify gaps from Round 1:
- Sub-questions with insufficient evidence
- Claims with only a single source (especially quantitative claims)
- Contradictions that need resolution
- Perspectives missing (e.g., only found proponent views, no critic views)
Targeted searches for each gap — use refined queries based on what Round 1 revealed.
Fetch and analyze — same process as Round 1.
Stop condition: If Round 2 yields no meaningful new information beyond Round 1, stop iterating. Do not force a third round. Document: "Additional search did not yield new findings; proceeding to synthesis."

Multi-Agent Mode (when activated in Phase 2)

When ≥ 4 sub-questions with rich information landscape:

Spawn parallel agents via sessions_spawn — one agent per sub-question (or per 2 related sub-questions).
Each agent's task description must include:
- The specific sub-question(s) to investigate
- Relevant terminology from the probe phase
- Instructions to complete Round 1 + Round 2
- Output format: findings + confidence levels + gaps + source list
- Reminder: "You are running in isolation. You cannot see other agents' work."
Role differentiation — if 4+ agents are spawned, assign complementary perspectives to improve coverage:
- Agent A (Breadth): Cast a wide net — find diverse sources, prioritize coverage over depth.
- Agent B (Depth): Go deep on the most important sub-question — primary sources, technical details.
- Agent C (Critical): Play devil's advocate — search for counterarguments, limitations, and failure cases.
- Agent D+ (Standard): Investigate remaining sub-questions with the default approach.
This is not rigid; adapt role assignment based on the topic. The goal is to ensure not all agents search the same way.
Collect results from all agents, then proceed to Phase 4.

Phase 4: Synthesize (Cross-Topic Integration)

Goal: Merge all findings into a coherent understanding with honest uncertainty.

Steps

Cross-question integration:
- Identify shared conclusions across sub-questions.
- Discover meta-patterns (e.g., "all sources agree on X but diverge on Y").
- Map relationships between sub-topics.

Conflict resolution protocol:

Record every contradiction found.
Analyze why sources disagree: different methodologies? different time periods? different populations? different definitions? vested interests?
Assess which side has stronger evidence (more sources, higher authority, more recent data).

Label with confidence:

Level	Meaning
HIGH	Multiple independent, authoritative sources agree. Cross-verified.
MEDIUM	Credible sources support this, but limited corroboration or minor inconsistencies.
LOW	Single source, or sources of uncertain reliability. Treat with caution.
SPECULATIVE	Consistent with available data but not directly verified. Hypothesis-level.

Evidence hierarchy — when weighing conflicting claims, rank evidence by type:

Tier	Evidence type	Weight
1	Systematic reviews & meta-analyses	Highest
2	Randomized controlled trials / rigorous experiments	High
3	Cohort / longitudinal studies	Medium-High
4	Expert consensus, official guidelines	Medium
5	Cross-sectional / observational studies	Medium
6	Expert opinion, editorials	Lower
7	Media reports, blog posts	Lowest — verify with primary sources

Not all topics are academic; adapt the hierarchy to the domain (e.g., for tech topics: official docs > benchmarks > expert blog posts > forum discussions).

Source credibility assessment for each key source used:
- Authority: Is this a primary source, peer-reviewed, official, or secondary/tertiary?
- Recency: When was it published? Is the information time-sensitive?
- Corroboration: Do other sources confirm it?
- Flag any source older than 6 months on time-sensitive topics.
Gap documentation:
- List questions that remain unanswered or insufficiently answered.
- Note where more specialized research (e.g., academic databases, paid reports) might help.

Phase 5: Deliver (Structured Report) — [STOP POINT 2]

Goal: Present findings in a clear, actionable structure.

Report Structure

# [Research Topic]

## Executive Summary
[2-4 paragraphs: what was asked, what was found, key conclusions, major uncertainties]

## Key Findings
- Finding 1 [HIGH confidence] — brief statement with source attribution
- Finding 2 [MEDIUM confidence] — brief statement with source attribution
- ...

## Detailed Analysis

### [Sub-topic 1]
[Narrative analysis with inline source citations. Include data, quotes, and reasoning.]

### [Sub-topic 2]
[...]

### [Sub-topic N]
[...]

## Contradictions & Uncertainties
[Explicit section listing conflicts found, how they were analyzed, and what remains unresolved. Each item includes confidence level.]

## Source List
| # | Source | Type | Date | Credibility | Used For |
|---|--------|------|------|-------------|----------|
| 1 | [Title](URL) | [Type] | [Date] | [HIGH/MED/LOW] | [Which finding] |
| ... |

## Methodology Appendix
- Probe queries used: [list]
- Total searches performed: [N]
- Pages fetched: [N]
- Sub-questions investigated: [list]
- Rounds completed: [N]
- Multi-agent: [Yes/No]
- Gaps remaining: [list]

Delivery

Present the full report to the user.
This is STOP POINT 2.
Offer: "Would you like me to dig deeper into any section, explore a related angle, or adjust the format?"

Anti-Hallucination Protocol

These rules are non-negotiable and apply across all phases:

Every factual claim must have a source. No unsourced assertions in the final report. If you cannot find a source, say "no source found" — never fabricate.
Single-source quantitative claims require cross-verification. If only one source provides a specific number (price, market size, percentage), actively search for a second source. If none found, label the claim as LOW confidence and note "single source."
Do not present model knowledge as research findings. If a fact comes from your training data rather than this session's searches, either:
- Verify it via search before including it, OR
- Explicitly label it: "Based on general knowledge, not verified in this research session."
"Insufficient data found" is a valid answer. Never stretch thin evidence to fill gaps. Acknowledge what you don't know.
Common hallucination traps to watch for:
- Platform rate limits and pricing (change frequently — verify)
- Regulatory details (jurisdiction-specific and fast-changing)
- Market size numbers (often wildly inconsistent across sources)
- Historical dates and attributions (easy to confuse)

Freshness Management

Tag every source with its publication date (or "date unknown" if not determinable).
Flag stale data: Any source older than 6 months on a time-sensitive topic gets a visible marker: [⚠️ dated: YYYY-MM].
Time-sensitive categories (always apply freshness filters):
- Technology / software / APIs
- Pricing and business models
- Regulations and policy
- Market data and competitive landscape
- Political / geopolitical situations
Evergreen categories (freshness less critical):
- Scientific fundamentals
- Historical events
- Mathematical / logical concepts
- Established best practices (though check for updates)

Quality Self-Check (Pre-Delivery Checklist)

Before presenting the Phase 5 report, verify:

- [ ] All sub-questions from Phase 2 are addressed (or explicitly marked as unanswered)
- [ ] Every factual claim has a cited source
- [ ] Single-source quantitative claims are flagged or cross-verified
- [ ] Contradictions are documented with analysis, not silently resolved
- [ ] Confidence levels (HIGH/MEDIUM/LOW/SPECULATIVE) are assigned to all key findings
- [ ] Sources older than 6 months on time-sensitive topics are flagged
- [ ] No more than 3 citations from the same domain in the entire report
- [ ] Executive Summary accurately reflects the detailed findings (no unsupported claims)
- [ ] Methodology Appendix is complete (queries, counts, gaps)
- [ ] Report language matches the user's language
- [ ] Gaps and limitations are honestly documented

If any check fails, fix it before delivering.

Edge Cases

User says "just do it" / "直接做" at the start

Still run Phase 1 (Probe) — it's only ~10 seconds.
Skip the Phase 2 stop point — proceed directly from Orient to Deep Search.
Still stop at Phase 5 to deliver the report.

Topic is too broad

Probe will reveal this (information density = Rich across many unrelated facets).
In Phase 2, ask the user to narrow: "This topic spans [X, Y, Z]. Which angle matters most to you?"

Topic is too narrow / obscure

Probe will reveal this (information density = Sparse).
In Phase 2, suggest broadening: "Direct information is scarce. I can investigate [related broader topic] or [adjacent angle] instead."

User provides a URL instead of a question

Fetch the URL first, extract the core topic/claim, then run the full 5-phase process to research that topic.

Previous research exists

If the user mentions prior research or provides context, incorporate it as pre-existing knowledge in Phase 2 planning. Do not re-research what's already established unless verification is needed.
Proactive check: Before Phase 1, scan available context (conversation history, any referenced files or notes) for prior research on the same or related topics. If found, note it in Phase 2 and avoid redundant searches.

State Persistence (Long Research Sessions)

For research sessions that may be interrupted or span extended time:

Intermediate state to file: If the research involves ≥ 4 sub-questions or multi-agent mode, write intermediate findings to a working file (e.g., research-state.md) after each completed sub-question. This ensures recovery if the session is interrupted.

Working file structure:

# Research State: [Topic]
Status: [in-progress / complete]
Started: [timestamp]
Last updated: [timestamp]

## Completed Sub-questions
### SQ1: [question]
- Findings: [summary]
- Sources: [list]
- Confidence: [level]

## Pending Sub-questions
- SQ3: [question] — not yet started
- SQ4: [question] — not yet started

## Gaps Identified
- [gap 1]

Recovery: If resuming an interrupted session, read the state file and continue from where research stopped — do not restart from scratch.

Research Lifecycle

A completed research report is not necessarily the end:

One-shot delivery (default): Run phases 1-5, deliver report, done.
Follow-up deepening: If the user asks to explore a section further after delivery, treat it as a new mini-research cycle — re-enter Phase 3 with targeted queries for that section only. No need to re-probe.
Ongoing monitoring: If the user asks to "keep an eye on" the topic, note the key queries and suggest a monitoring cadence. This skill does not implement automatic monitoring, but the report structure (queries used, sources tracked) makes it easy to re-run periodically.
Upgrade path: If research findings warrant a formal document (project spec, decision memo, strategy doc), offer to restructure the report into the target format.

Multi-Engine Search Strategy

When available, route queries to the most appropriate source:

Information type	Preferred search approach
General / factual	Default `web_search`
Academic / scientific	`web_search` with `site:arxiv.org`, `site:scholar.google.com`, `site:pubmed.ncbi.nlm.nih.gov`
Community discussion	`web_search` with `site:reddit.com` (limit to 3 keywords for Reddit)
Code / technical	`web_search` with `site:github.com`, `site:stackoverflow.com`
Video content	`web_search` with `site:youtube.com` to find relevant videos
Company / product	`web_search` with `site:crunchbase.com`, `site:g2.com`, `site:producthunt.com`
News / current events	`web_search` with recency filter
Government / official	`web_search` with `site:.gov`, `site:.edu`, or country-specific domains

Apply these routing rules in Phase 3 when constructing query variants. Not every sub-question needs all routes — pick the 1-2 most relevant per sub-question.

Platform-Specific Tips

Reddit: Use ≤ 3 keywords; try old.reddit.com if main site blocks; append .json to URLs for structured data.
GitHub: github.com/topics/<keyword> often works better than raw search; check stars and last update to filter abandoned repos.
Academic: Follow citation trails — if a secondary source cites a primary, try to fetch the primary.
Paywalled content: If web_fetch fails on a paywalled source, check for preprint versions, author's personal site, or archived copies.