Scout
Scout conducts lawful OSINT research on people, companies, and organizations, assembling provenance-backed briefs where every claim carries a source reference, retrieval timestamp, and direct quote. It works through a tiered source waterfall — public web first, then rate-limited registries, then paid databases only with explicit permission — collecting no more than the stated research goal requires.
When to use
- Research a person and build a source-backed brief
- Do background research on a company using public sources
- Resolve whether two profiles are the same person with cited sources
- Compile what is publicly knowable about a subject
- Expand a quick lookup into an auditable brief
When not to use
- Illegal intrusion into private systems
- Credential theft or bypassing access controls
- Covert surveillance
- Speculative doxxing
- Topic research without a person/org focus — use Sift
Responsibility boundary
Scout owns lawful OSINT research on people and organizations with provenance-backed output.
Scout does not own: general topic research (Sift), image processing (Look), knowledge graph writes (Elephas), social graph (Weave), communications (Dispatch).
Commands
scout.research.start — begin a new research request with subject and goal
scout.research.expand --tier <1|2|3> — escalate to a higher source tier
scout.brief.render — generate the final markdown brief with findings and sources
scout.brief.render_pdf — optional PDF brief generation
scout.status — return current research state
scout.journal — write journal for the current run; called at end of every run
scout.update — pull latest from GitHub source; preserves journals and data
Invariants
- Legality-first — only publicly available sources without bypassing access controls
- Minimization — collect only what the research goal requires
- Provenance for every claim — at least one source reference with URL, retrieval timestamp, and quote
- Paid sources require explicit permission — Tier 3 needs a recorded PermissionGrant
- No doxxing by default — private details suppressed unless explicitly permitted
- Uncertainty must be surfaced — incomplete identity resolution stated clearly
Input contract
ResearchRequest requires: request_id, as_of, subject (type, name, aliases, known_locations, known_handles), goal, constraints (time_budget_minutes, minimize_pii).
Read references/scout_schemas.md for exact schema.
Research workflow
- Normalize request and subject identity inputs
- Resolve likely identity matches conservatively
- Run Tier 1 public-source collection
- Record provenance for every retained claim
- Compile preliminary findings with confidence levels
- Escalate to Tier 2 only if enabled and useful
- Escalate to Tier 3 only after explicit permission grant is recorded
- Generate brief with findings, uncertainty, and source log
- Store request, findings, sources, and decisions locally
- Emit Signal files for confirmed entities and relationships to
~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json. Use Signal schema from spec-ocas-shared-schemas.md. One file per entity or relationship with sufficient confidence.
- Write journal via
scout.journal
When minimize_pii=true, suppress unnecessary sensitive details in the final brief.
Source waterfall
Read references/scout_source_waterfall.md for full tier logic.
- Tier 1 — public web, official sites, news, filings, public social profiles. Automatic.
- Tier 2 — rate-limited sources, registries, extended datasets. Only if enabled and useful.
- Tier 3 — paid OSINT providers, background databases. Requires explicit permission grant.
Output requirements
Markdown brief with: Executive Summary, Identity Resolution Notes, Findings, Risk and Uncertainty, Source Log. Every finding carries source-backed provenance.
Inter-skill interfaces
Scout writes Signal files to Elephas intake: ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json
Emit one Signal file per confirmed entity or high-confidence relationship discovered during research. Use the Signal schema from spec-ocas-shared-schemas.md. Elephas decides promotion.
See spec-ocas-interfaces.md for signal format.
Storage layout
~/openclaw/data/ocas-scout/
config.json
requests.jsonl
sources.jsonl
findings.jsonl
decisions.jsonl
briefs/
reports/
~/openclaw/journals/ocas-scout/
YYYY-MM-DD/
{run_id}.json
Default config.json:
{
"skill_id": "ocas-scout",
"skill_version": "2.3.0",
"config_version": "1",
"created_at": "",
"updated_at": "",
"waterfall": {
"enabled_tiers": [1, 2]
},
"paid_sources": {
"enabled": false
},
"brief": {
"format": "markdown"
},
"retention": {
"days": 90,
"max_records": 10000
}
}
OKRs
Universal OKRs from spec-ocas-journal.md apply to all runs.
skill_okrs:
- name: verified_claim_ratio
metric: fraction of findings with at least one verified source reference
direction: maximize
target: 0.70
evaluation_window: 30_runs
- name: entity_resolution_accuracy
metric: fraction of identity resolutions confirmed correct
direction: maximize
target: 0.90
evaluation_window: 30_runs
- name: source_diversity
metric: median unique source domains per brief
direction: maximize
target: 6
evaluation_window: 30_runs
Optional skill cooperation
- Weave — read social graph (read-only) for identity context
- Elephas — optionally emit Signal files for Chronicle promotion
- Sift — may use Sift for web searches during research
Journal outputs
- Observation Journal — research runs producing findings
- Research Journal — structured multi-source research sessions
Visibility
public
Initialization
On first invocation of any Scout command, run scout.init:
- Create
~/openclaw/data/ocas-scout/ and all subdirectories (briefs/, reports/)
- Write default
config.json with ConfigBase fields if absent
- Create empty JSONL files:
requests.jsonl, sources.jsonl, findings.jsonl, decisions.jsonl
- Create
~/openclaw/journals/ocas-scout/
- Ensure
~/openclaw/db/ocas-elephas/intake/ exists (create if missing)
- Register cron job
scout:update if not already present (check openclaw cron list first)
- Log initialization as a DecisionRecord in
decisions.jsonl
Background tasks
| Job name | Mechanism | Schedule | Command |
|---|
scout:update | cron | 0 0 * * * (midnight daily) | scout.update |
openclaw cron add --name scout:update --schedule "0 0 * * *" --command "scout.update" --sessionTarget isolated --lightContext true --timezone America/Los_Angeles
Self-update
scout.update pulls the latest package from the source: URL in this file's frontmatter. Runs silently — no output unless the version changed or an error occurred.
- Read
source: from frontmatter → extract {owner}/{repo} from URL
- Read local version from
skill.json
- Fetch remote version:
gh api "repos/{owner}/{repo}/contents/skill.json" --jq '.content' | base64 -d | python3 -c "import sys,json;print(json.load(sys.stdin)['version'])"
- If remote version equals local version → stop silently
- Download and install:
TMPDIR=$(mktemp -d)
gh api "repos/{owner}/{repo}/tarball/main" > "$TMPDIR/archive.tar.gz"
mkdir "$TMPDIR/extracted"
tar xzf "$TMPDIR/archive.tar.gz" -C "$TMPDIR/extracted" --strip-components=1
cp -R "$TMPDIR/extracted/"* ./
rm -rf "$TMPDIR"
- On failure → retry once. If second attempt fails, report the error and stop.
- Output exactly:
I updated Scout from version {old} to {new}
Support file map
| File | When to read |
|---|
references/scout_schemas.md | Before creating requests, findings, or briefs |
references/scout_source_waterfall.md | Before tier selection or escalation decisions |
references/scout_brief_template.md | Before rendering briefs |
references/journal.md | Before scout.journal; at end of every run |