Sift

v2.3.0

Web search, research synthesis, fact verification, and entity extraction. The system's general research engine. Use for topic research, web lookups, fact-che...

⭐ 0· 243·0 current·0 all-time

byIndigo Karasu@indigokarasu

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for indigokarasu/ocas-sift.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Sift" (indigokarasu/ocas-sift) from ClawHub.
Skill page: https://clawhub.ai/indigokarasu/ocas-sift
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ocas-sift

ClawHub CLI

Package manager switcher

npx clawhub@latest install ocas-sift

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

The skill's name/description (web research, synthesis, fact verification, entity extraction) aligns with the declared optional credentials (Brave, Exa, Tavily) and with the filesystem reads/writes (journals, data, Elephas intake). Reading Thread/Chronicle context and using tiered search providers is consistent with the research functionality.

Instruction Scope

Runtime instructions ask the agent to persist sessions, write journals, and emit Signal files to ~/openclaw/db/ocas-elephas/intake/ for entity promotion — behavior that will make researched data persistent and hand it off to another skill (Elephas). The instructions also mention reading conversation context, Chronicle, and potentially geolocation from other components; those data sources are not listed as explicit credentials but the skill will consume contextual/system data if available. The SKILL.md also instructs self-update behavior (pull latest from GitHub) and README claims registering a midnight cron for automatic updates — these are side effects beyond pure query/lookup work and expand runtime scope.

Install Mechanism

There is no formal install spec in the registry, but SKILL.md and README include an 'openclaw skill install https://github.com/indigokarasu/sift' line and describe automatic self-updates via a cron job. The registry provides no packaged install instructions or vetted release URL; self-update/auto-install behavior implies fetching code from GitHub at runtime, which increases risk and is not fully declared in the registry metadata.

ℹ

Credentials

The skill lists optional API keys for the search/semantic providers (brave_search_api_key, exa_api_key, tavily_api_key) which are proportional to its function. No unrelated secrets are requested. However, it will write extracted entities and decisions to local intake/journal paths — this may surface or persist sensitive content into the system knowledge pipeline (Elephas/Chronicle), so users should consider whether that data flow is acceptable.

Persistence & Privilege

Although always:false, the skill claims to register a daily 'sift:update' cron job and to persist session/journal/entity files under the user's home directory. Scheduled self-updates and persistent writing into shared intake directories represent lasting changes and a broader blast radius (automatic downloads, ongoing background behavior, and cross-skill data flows). These persistent actions are not fully explicit in the registry install metadata.

What to consider before installing

What to consider before installing: - Data persistence: Sift will write journals, session data, and Signal files to ~/openclaw/... and will emit extracted entities to the Elephas intake; reviewed or sensitive queries can become persistent artifacts and may be promoted into a shared knowledge graph. If you handle sensitive data, decide whether to allow these write locations or to restrict them. - Self-update behavior: SKILL.md/README describe automatic self-updates and a cron job that pulls from GitHub. The registry has no formal install package; that means the skill expects to fetch code externally at runtime. Ask whether you want a skill that can download and update itself automatically — this increases risk and you should review the upstream GitHub repo before enabling. - External network calls: The skill will call free search providers (Brave/DuckDuckGo/SearXNG) and optional paid semantic providers (Exa/Tavily). Optional API keys are reasonable for this purpose; provide only keys you trust. If you prefer offline or air-gapped usage, do not supply provider keys and limit external tiers. - Cross-skill interactions: Sift writes to Elephas intake and may read Thread/Chronicle context. Confirm you trust those other skills and that their intake/promotion behavior is acceptable. - Checklist before install: inspect the upstream GitHub repository referenced in SKILL.md, confirm you accept automatic writes to ~/openclaw paths, decide whether to allow cron-based self-updates, and restrict or withhold API keys if you want to limit external queries. Because of the undeclared self-update/cron behavior and persistent cross-skill writes, this skill is coherent with its purpose but carries non-trivial persistence and remote-fetch risks — review the upstream source and decide whether to disable auto-updates or restrict filesystem paths before enabling.

Like a lobster shell, security has layers — review code before you run it.

latestvk9755dkvm4qen2da0qtdexfq9183rszm

243downloads

0stars

3versions

Updated 3h ago

v2.3.0

MIT-0

Sift

Sift is the system's general research engine, retrieving and synthesizing information from the web across a tiered source hierarchy — internal knowledge first, then free web search, then rate-limited semantic research providers for deep work. It evaluates source reliability through cross-source agreement scoring, extracts structured entities from retrieved content, and emits enrichment candidates to Chronicle so researched knowledge accumulates over time.

When to use

Web search and research synthesis on any topic
Fact verification across multiple sources with consensus scoring
Document summarization and structured entity extraction
Comparison research across products, technologies, or options
Deep research sessions with multi-source threading

When not to use

OSINT investigations on individuals — use Scout
Image-to-action processing — use Look
Pattern analysis on the knowledge graph — use Corvus
Communications and message drafting — use Dispatch

Sift never performs OSINT investigations on individuals. If the primary entity of a query is a person, Scout should be invoked.

Responsibility boundary

Sift owns web research, fact verification, and structured entity extraction.

Sift does not own: person-focused OSINT (Scout), image processing (Look), knowledge graph writes (Elephas), pattern analysis (Corvus), social graph (Weave).

Commands

sift.search — execute a search query with automatic tier selection and query rewriting
sift.research — run a multi-source research session producing a structured research journal
sift.verify — fact-check a specific claim across multiple sources with consensus scoring
sift.summarize — summarize a document or URL with structured entity extraction
sift.extract — extract entities, claims, statistics, and relationships from content
sift.thread.list — list active research threads with entity overlap detection
sift.status — return current state: active threads, quota usage, source reputation summary
sift.journal — write journal for the current run; called at end of every run
sift.update — pull latest from GitHub source; preserves journals and data

Response modes

Sift classifies query depth automatically:

quick_answer — simple factual lookups, single-source sufficient
comparison — multi-source comparison with structured output
research — deep multi-session investigation with threading
document_analysis — URL or document-focused extraction

Users may override with phrases like "quick answer", "deep dive", "compare", or "summarize".

Search tier selection

Tier 1 — Internal Knowledge: LLM knowledge, conversation context, Chronicle if available.
Tier 2 — Free Web Search: Brave Search API, SearXNG, DuckDuckGo. Default for all queries.
Tier 3 — Semantic Research: Exa, Tavily. Deep research with sparse sources only. Quota-limited.

Read references/search_tiers.md for provider details and escalation criteria.

Source reputation model

Sift maintains per-domain trust scores based on: cross-source agreement, contradiction frequency, historical accuracy, structured data quality, citation frequency.

Structured extraction rules

When pages are retrieved, extract: entities (with type from shared ontology), claims, statistics, relationships, citations. Each extraction includes confidence level.

Extracted entities are emitted as enrichment candidates for Elephas.

Run completion

After every Sift command that produces results:

Persist session, entities, sources, and decisions to local JSONL files
For each extracted entity or relationship with confidence >= med: write a Signal file to ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json. Use Signal schema from spec-ocas-shared-schemas.md.
Write journal via sift.journal

Chronicle interaction

Sift never writes directly to Chronicle. It emits enrichment candidates via Signal files to ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json. Elephas decides promotion.

Inter-skill interfaces

Sift writes Signal files to Elephas intake: ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json

Sift may read from Thread (when present) for recent browsing context to improve query rewriting. This is a cooperative read, not a dependency.

See spec-ocas-interfaces.md for signal format.

Storage layout

~/openclaw/data/ocas-sift/
  config.json
  sessions.jsonl
  threads.jsonl
  entities.jsonl
  sources.jsonl
  decisions.jsonl
  reports/

~/openclaw/journals/ocas-sift/
  YYYY-MM-DD/
    {run_id}.json

Default config.json:

{
  "skill_id": "ocas-sift",
  "skill_version": "2.3.0",
  "config_version": "1",
  "created_at": "",
  "updated_at": "",
  "search": {
    "default_tier": 2,
    "tier3_daily_limit": 50
  },
  "retention": {
    "days": 30,
    "max_records": 10000
  }
}

OKRs

Universal OKRs from spec-ocas-journal.md apply to all runs.

skill_okrs:
  - name: source_accuracy
    metric: fraction of extracted facts confirmed by cross-source agreement
    direction: maximize
    target: 0.85
    evaluation_window: 30_runs
  - name: tier3_quota_compliance
    metric: fraction of days where Tier 3 usage stays within daily limit
    direction: maximize
    target: 1.0
    evaluation_window: 30_runs
  - name: entity_extraction_precision
    metric: fraction of extracted entities with valid source reference
    direction: maximize
    target: 0.90
    evaluation_window: 30_runs

Optional skill cooperation

Elephas — emit Signal files for Chronicle promotion after every extraction
Thread — may read recent browsing context for query rewriting (cooperative, not required)
Weave — may use Weave for entity disambiguation
Chronicle — may read Chronicle (read-only) for entity context

Journal outputs

Observation Journal — search and extraction runs
Research Journal — structured multi-source research sessions

Initialization

On first invocation of any Sift command, run sift.init:

Create ~/openclaw/data/ocas-sift/ and subdirectories (reports/)
Write default config.json with ConfigBase fields if absent
Create empty JSONL files: sessions.jsonl, threads.jsonl, entities.jsonl, sources.jsonl, decisions.jsonl
Create ~/openclaw/journals/ocas-sift/
Ensure ~/openclaw/db/ocas-elephas/intake/ exists (create if missing)
Register cron job sift:update if not already present (check openclaw cron list first)
Log initialization as a DecisionRecord in decisions.jsonl

Background tasks

Job name	Mechanism	Schedule	Command
`sift:update`	cron	`0 0 * * *` (midnight daily)	`sift.update`

openclaw cron add --name sift:update --schedule "0 0 * * *" --command "sift.update" --sessionTarget isolated --lightContext true --timezone America/Los_Angeles

Self-update

sift.update pulls the latest package from the source: URL in this file's frontmatter. Runs silently — no output unless the version changed or an error occurred.

Read source: from frontmatter → extract {owner}/{repo} from URL
Read local version from skill.json
Fetch remote version: gh api "repos/{owner}/{repo}/contents/skill.json" --jq '.content' | base64 -d | python3 -c "import sys,json;print(json.load(sys.stdin)['version'])"
If remote version equals local version → stop silently

Download and install:

TMPDIR=$(mktemp -d)
gh api "repos/{owner}/{repo}/tarball/main" > "$TMPDIR/archive.tar.gz"
mkdir "$TMPDIR/extracted"
tar xzf "$TMPDIR/archive.tar.gz" -C "$TMPDIR/extracted" --strip-components=1
cp -R "$TMPDIR/extracted/"* ./
rm -rf "$TMPDIR"

On failure → retry once. If second attempt fails, report the error and stop.
Output exactly: I updated Sift from version {old} to {new}

Visibility

public

Support file map

File	When to read
`references/schemas.md`	Before creating sessions, threads, or extraction records
`references/search_tiers.md`	Before tier selection or escalation
`references/query_rewrite.md`	Before query rewriting
`references/journal.md`	Before sift.journal; at end of every run

Comments

Loading comments...