deep-scout

v0.1.4

Multi-stage deep intelligence pipeline (Search → Filter → Fetch → Synthesize). Turns a query into a structured research report with full source citations.

0· 399·2 current·2 all-time
byJonathan Jing@jonathanjing
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name and description (web search → filter → fetch → synthesize) match the actual behavior: it calls web_search/web_fetch, uses LLMs for filtering/synthesis, and optionally uses a local Firecrawl CLI or the browser tool. Required binaries (bash, python3, timeout/gtimeout) and included scripts are proportional to the described functionality.
Instruction Scope
SKILL.md and scripts explicitly instruct the agent to fetch arbitrary web URLs and feed extracted content to LLMs (expected for a research tool). The run.sh includes query sanitization and output-path restrictions as mitigations. Users should note that fetched page content (including snapshots) will be sent to the LLM — this is intended but a privacy consideration.
Install Mechanism
No install spec (instruction-only) and included shell scripts only; no remote downloads are performed by an installer. The optional Firecrawl integration calls a local CLI if present. This is a low-risk install footprint.
Credentials
The skill requests no environment variables, no credentials, and no config paths. That aligns with its purpose: it leverages agent-provided tools (web_search, web_fetch, browser) rather than external API keys.
Persistence & Privilege
always:false (default) and no code attempts to modify other skills or system-wide agent settings. The skill writes its own state to a skill-local state file (deep-scout-state.json) — expected for resumability.
Assessment
This skill appears to do what it says: it runs a search → filter → fetch → synthesize pipeline using agent web tools and LLM prompts. Before installing, be aware of these practical points: 1) The skill will fetch arbitrary web pages and send their extracted text to the LLM — avoid using it for highly sensitive/private queries or internal URLs you don't want shared with the model. 2) It may run local shell scripts (run.sh, firecrawl wrapper). The package includes sanitization and an output-path check, which is good, but you can review those scripts yourself before enabling. 3) Firecrawl is optional and only invoked if present locally; otherwise the wrapper reports FIRECRAWL_UNAVAILABLE. 4) The agent will be able to invoke the skill normally (autonomous invocation is the platform default); if you prefer manual control, only call it interactively. If you'd like greater assurance, inspect scripts/run.sh and prompts locally, and test with non-sensitive queries first.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binsbash, python3
Any bintimeout, gtimeout
latestvk9718qve9jc07avdyxattgtm55829xbe
399downloads
0stars
5versions
Updated 1mo ago
v0.1.4
MIT-0

deep-scout

Multi-stage deep intelligence pipeline (Search → Filter → Fetch → Synthesize).

🛠️ Installation

1. Ask OpenClaw (Recommended)

Tell OpenClaw: "Install the deep-scout skill." The agent will handle the installation and configuration automatically.

2. Manual Installation (CLI)

If you prefer the terminal, run:

clawhub install deep-scout

🚀 Usage

/deep-scout "Your research question" [--depth 5] [--freshness pw] [--country US] [--style report]

Options

FlagDefaultDescription
--depth N5Number of URLs to fully fetch (1–10)
--freshnesspwpd=past day, pw=past week, pm=past month, py=past year
--countryUS2-letter country code for Brave search
--languageen2-letter language code
--search-count8Total results to collect before filtering
--min-score4Minimum relevance score to keep (0–10)
--stylereportreport | comparison | bullets | timeline
--dimensionsautoComparison dimensions (comma-separated, for --style comparison)
--output FILEstdoutWrite report to file
--no-browserDisable browser fallback
--no-firecrawlDisable Firecrawl fallback

🛠️ Pipeline — Agent Loop Instructions

When this skill is invoked, execute the following four-stage pipeline:


Stage 1: SEARCH

Call web_search with:

query: <user query>
count: <search_count>
country: <country>
search_lang: <language>
freshness: <freshness>

Collect: title, url, snippet for each result.
If fewer than 3 results returned, retry with freshness: "py" (relaxed).


Stage 2: FILTER

Load prompts/filter.txt. Replace template vars:

  • {{query}} → the user's query
  • {{freshness}} → freshness param
  • {{min_score}} → min_score param
  • {{results_json}} → JSON array of search results

Call the LLM with this prompt. Parse the returned JSON array.
Keep only results where keep: true. Sort by score descending.
Take top depth URLs as the fetch list.

Deduplication: Max 2 results per root domain (already handled in filter prompt).


Stage 3: FETCH (Tiered Escalation)

For each URL in the filtered list:

Tier 1 — web_fetch (fast):

Call web_fetch(url)
If content length >= 200 chars → accept, trim to max_chars_per_source

Tier 2 — Firecrawl (deep/JS):

If Tier 1 fails or returns < 200 chars:
  Run: scripts/firecrawl-wrap.sh <url> <max_chars>
  If output != "FIRECRAWL_UNAVAILABLE" and != "FIRECRAWL_EMPTY" → accept

Tier 3 — Browser (last resort):

If Tier 2 fails:
  Call browser(action="open", url=url)
  Call browser(action="snapshot")
  Load prompts/browser-extract.txt, substitute {{query}} and {{max_chars_per_source}}
  Call LLM with snapshot content + extraction prompt
  If output != "FETCH_FAILED:..." → accept

If all tiers fail: Use the original snippet from Stage 1 search results. Mark as [snippet only].

Store: { url: extracted_content } dict.


Stage 4: SYNTHESIZE

Choose prompt template based on --style:

  • report / bullets / timelineprompts/synthesize-report.txt
  • comparisonprompts/synthesize-comparison.txt

Replace template vars:

  • {{query}} → user query
  • {{today}} → current date (YYYY-MM-DD)
  • {{language}} → language param
  • {{source_count}} → number of successfully fetched sources
  • {{dimensions_or_auto}} → dimensions param (or "auto")
  • {{fetched_content_blocks}} → build as:
    [Source 1] (url1)
    <content>
    ---
    [Source 2] (url2)
    <content>
    

Call LLM with the filled prompt. The output is the final report.

If --output FILE is set, write the report to that file. Otherwise, print to the channel.


⚙️ Configuration

Defaults are in config.yaml. Override via CLI flags above.


📂 Project Structure

skills/deep-scout/
├── SKILL.md                     ← This file (agent instructions)
├── config.yaml                  ← Default parameter values
├── prompts/
│   ├── filter.txt               ← Stage 2: relevance scoring prompt
│   ├── synthesize-report.txt    ← Stage 4: report/bullets/timeline synthesis
│   ├── synthesize-comparison.txt← Stage 4: comparison table synthesis
│   └── browser-extract.txt      ← Stage 3: browser snapshot extraction
├── scripts/
│   ├── run.sh                   ← CLI entrypoint (emits pipeline actions)
│   └── firecrawl-wrap.sh        ← Firecrawl CLI wrapper with fallback handling
└── examples/
    └── openclaw-acquisition.md  ← Example output: OpenClaw M&A intelligence

🔧 Error Handling

ScenarioHandling
All fetch attempts failUse snippet from Stage 1; mark [snippet only]
Search returns 0 resultsRetry with freshness: py; error if still 0
Firecrawl not installedfirecrawl-wrap.sh outputs FIRECRAWL_UNAVAILABLE, skip silently
Browser tool unavailableSkip Tier 3; proceed with available content
LLM synthesis exceeds contextTrim sources proportionally, prioritize high-score sources
Rate limit on Brave APIWait 2s, retry once

📋 Example Outputs

See examples/openclaw-acquisition.md for a full sample report.


Deep Scout v0.1.0 · OpenClaw Skills · clawhub: deep-scout

Comments

Loading comments...