OnionClaw — Tor / Dark Web OSINT
v2.1.13 · by JacobJandon · MIT-0 License
github.com/JacobJandon/OnionClaw
OnionClaw routes all requests through the Tor network. It queries 12 verified
dark web search engines simultaneously, fetches .onion hidden-service pages,
rotates Tor circuits, schedules recurring watch/alert jobs, and produces
structured OSINT reports (Markdown, JSON, STIX, MISP, CSV) using the Robin
investigation pipeline.
Setup (run once after install)
# 1. Install Python dependencies
pip3 install requests[socks] beautifulsoup4 python-dotenv stem
# 2. Interactive first-run wizard (sets up .env, torrc, and Tor in one step)
python3 {baseDir}/setup.py
# — OR — manual setup:
cp {baseDir}/.env.example {baseDir}/.env
# Edit {baseDir}/.env — add your LLM key (search + fetch work without one)
Start Tor (required before any command):
# Linux:
sudo apt install tor && sudo systemctl start tor
# macOS:
brew install tor && brew services start tor
# Custom (no root needed — setup.py can do this automatically):
tor -f /tmp/sicry_tor.conf &
# torrc: SocksPort 9050 / ControlPort 9051 / CookieAuthentication 1 / DataDirectory /tmp/tor_data
Enable circuit rotation (required for renew.py and --daemon-poll):
Add to /etc/tor/torrc:
ControlPort 9051
CookieAuthentication 1
Then: systemctl restart tor
setup.py does this automatically.
Commands
Check Tor is running
Always run this first before any dark web operation.
python3 {baseDir}/check_tor.py
Returns your exit IP and tor_active: true/false. If false, tell the user to
start Tor before continuing.
Rotate Tor identity
Get a fresh exit node and a new three-hop circuit. Use between sessions or
whenever a new IP is needed.
python3 {baseDir}/renew.py
Returns success: true/false. If false, ensure ControlPort 9051 is enabled
and TOR_DATA_DIR is set in .env (or use setup.py).
Check which search engines are alive
Ping all 12 engines via Tor and return latency + up/down for each.
python3 {baseDir}/check_engines.py
Run before a large search session; pass the alive engine names to --engines
to skip dead ones and save time.
Search the dark web
Query all 12 dark web engines simultaneously. Returns deduplicated
{title, url, engine} results.
# Basic:
python3 {baseDir}/search.py --query "SEARCH_TERM"
# Limit results:
python3 {baseDir}/search.py --query "SEARCH_TERM" --max 30
# Specific engines:
python3 {baseDir}/search.py --query "SEARCH_TERM" --engines Ahmia Tor66 Ahmia-clearnet
Available engines: Ahmia, OnionLand, Amnesia, Torland, Excavator, Onionway,
Tor66, OSS, Torgol, TheDeepSearches, DuckDuckGo-Tor, Ahmia-clearnet
Tip: Use short keyword queries (≤5 words). Dark web indexes respond far
better to focused keywords than natural-language questions.
Fetch a .onion page
Read the full text of any .onion URL (or clearnet URL) through Tor.
python3 {baseDir}/fetch.py --url "http://SOME.onion/path"
Returns: {title, text (first 3000 chars), links, status, error}.
If status: 0 or error is set, the hidden service is offline — they go
down frequently; try a different result from search.py.
OSINT analysis
Analyse raw dark web text with an LLM and produce a structured sectioned report.
# From a string:
python3 {baseDir}/ask.py --query "QUERY" --mode MODE --content "RAW_TEXT"
# From a file:
python3 {baseDir}/ask.py --query "QUERY" --mode MODE --file /path/to/content.txt
# From stdin (pipe):
echo "CONTENT" | python3 {baseDir}/ask.py --query "QUERY" --mode MODE
Analysis modes:
| Mode | Use for |
|---|
threat_intel | General OSINT (default) — artifacts, insights, next steps |
ransomware | Malware / C2 / MITRE ATT&CK TTPs, victim orgs, indicators |
personal_identity | PII / breach exposure, severity, protective actions |
corporate | Leaked credentials / code / internal docs, IR steps |
# With custom focus appended to the prompt:
python3 {baseDir}/ask.py --query "QUERY" --mode threat_intel \
--custom "Focus on cryptocurrency wallet addresses"
Full OSINT pipeline (single command)
Runs the complete Robin pipeline:
refine query → check live engines → search → filter best results →
batch scrape → OSINT analysis → save report
python3 {baseDir}/pipeline.py --query "INVESTIGATION_QUERY" --mode MODE
Essential flags:
| Flag | Default | Description |
|---|
--query TEXT | required | Investigation topic (natural language OK — refined automatically) |
--mode MODE | threat_intel | threat_intel / ransomware / personal_identity / corporate |
--max N | 30 | Max raw results from search |
--scrape N | 8 | Pages to batch-fetch (use 0 to skip scraping and get results-only report) |
--custom TEXT | | Extra LLM instructions appended to the mode prompt |
--out FILE | | Save report to file (exits 1 on permission error) |
--format FMT | md | Output format: md / json / csv / stix / misp |
--no-llm | | Skip all LLM steps — dump raw results / entity extraction only |
--confidence | | Show BM25 confidence score per result |
--engines NAME… | | Restrict to specific engines (skip dead ones) |
--no-cache | | Bypass query/page cache for this run |
--clear-cache | | Flush the result cache, then run |
--resume JOB_ID | | Resume a checkpointed pipeline run by job ID |
--interactive | | After the report, open a follow-up REPL for drill-down |
--output-dir DIR | | Write <job_id>.<ext> into DIR (batch pipeline friendly) |
--modes | | List all modes and their engine routing, then exit |
--engine-stats | | Print per-engine reliability / latency table, then exit |
--check-update | | Check for a newer OnionClaw release and exit |
--version | | Print version and exit |
MISP-specific flags:
| Flag | Default | Description |
|---|
--misp-threat-level N | 2 | MISP threat level 1–4 (1=high, 4=undefined) |
--misp-distribution N | 0 | MISP distribution (0=your org, 1=connected, 2=all, 3=inherited) |
Watch / alert flags:
| Flag | Description |
|---|
--watch | Register this query as a recurring watch job and exit |
--interval HOURS | Re-run interval in hours for --watch (default 6) |
--watch-check | Run all due watch jobs now and print alerts |
--watch-check --output-dir DIR | Same but write each job's JSON to DIR (exits 1 on write error) |
--watch-list | List all active watch jobs |
--watch-disable JOB_ID | Disable a watch job by ID |
--watch-clear-all | Disable ALL active watch jobs at once |
--watch-daemon | (deprecated alias) Run as a blocking daemon loop |
--daemon-poll SECONDS | Run --watch-check every N seconds in a daemon loop |
Daemon mode (continuous monitoring)
Keep OnionClaw running and poll watch jobs at a fixed interval:
python3 {baseDir}/pipeline.py --daemon-poll 3600 # check every hour
Scheduling watch jobs
Register a query as a recurring alert:
# Register (runs every 6 hours by default):
python3 {baseDir}/pipeline.py --query "ransomware hospital 2026" --watch --interval 6
# List all active jobs:
python3 {baseDir}/pipeline.py --watch-list
# Check due jobs now and write JSON files for each:
python3 {baseDir}/pipeline.py --watch-check --output-dir /tmp/alerts/
# Disable one job:
python3 {baseDir}/pipeline.py --watch-disable <JOB_ID>
# Clear all:
python3 {baseDir}/pipeline.py --watch-clear-all
Typical investigation flows
"Search the dark web for X"
python3 {baseDir}/check_tor.py — verify connected
python3 {baseDir}/search.py --query "X" — search all 12 engines
python3 {baseDir}/fetch.py --url "URL" — read top 2–3 results
python3 {baseDir}/ask.py --mode threat_intel --query "X" --content "..." — generate report
"Has company.com appeared in dark web leaks?"
python3 {baseDir}/check_tor.py
python3 {baseDir}/pipeline.py --query "company.com credentials leak" --mode corporate
- Present the structured report
"Investigate ransomware group X"
python3 {baseDir}/check_tor.py
python3 {baseDir}/pipeline.py --query "GROUP_NAME ransomware" --mode ransomware
"Write a STIX bundle for this investigation"
python3 {baseDir}/pipeline.py \
--query "QUERY" --mode threat_intel \
--format stix --out bundle.json
"Fetch this .onion URL"
python3 {baseDir}/check_tor.py
python3 {baseDir}/fetch.py --url "URL"
- Show the user the title + text content
"Monitor for new leaks mentioning acme.com, alert me daily"
python3 {baseDir}/pipeline.py \
--query "acme.com leak credentials" --watch --interval 24
# Later, in a cron job or daemon:
python3 {baseDir}/pipeline.py --watch-check --output-dir /tmp/acme-alerts/
Output formats
| Format | Flag | Use for |
|---|
| Markdown | --format md (default) | Human-readable reports, --out report.md |
| JSON | --format json | Structured machine-readable, automation |
| CSV | --format csv | Spreadsheet import, result lists |
| STIX 2.1 | --format stix | Threat-intel platforms (MISP, OpenCTI, Splunk ES) |
| MISP | --format misp | Direct MISP event import |
Important notes
- All traffic routes through Tor — tell the user this when relevant.
.onion hidden services go offline frequently. status: 0 means the site
is temporarily unreachable — try a different result from search.py.
- Dark web search indexes go down often — run
check_engines.py first and
pass only alive engine names with --engines.
- LLM tools (
ask.py, pipeline steps 3/5/7) require an API key in
{baseDir}/.env. Set LLM_PROVIDER=ollama for fully local inference with
no key. search.py, fetch.py, check_tor.py, renew.py, and
check_engines.py work with no key at all.
--scrape 0 skips page fetching. The pipeline still runs step 7 (LLM
analysis on search-result metadata only) and writes --out / --output-dir
normally. A WARN: --scrape 0 notice is printed to stderr.
- Use responsibly and lawfully — OSINT, security research, and threat
intelligence only.
Maintenance
Update the bundled sicry.py engine
OnionClaw bundles sicry.py from the upstream
SICRY™ repo.
After a new SICRY™ release, sync the bundled copy:
# Pull latest:
python3 {baseDir}/sync_sicry.py
# Pull a specific release tag:
python3 {baseDir}/sync_sicry.py --tag v2.1.13
# Preview without writing:
python3 {baseDir}/sync_sicry.py --dry-run
Checking for OnionClaw updates
OnionClaw checks the GitHub Releases API (published releases only — not
plain git tags) for newer versions. A one-line notice is printed automatically
at pipeline startup when an update is available.
# On-demand update check:
python3 {baseDir}/pipeline.py --check-update
# Programmatic:
import sicry
r = sicry.check_update()
if not r["up_to_date"]:
print(f"Update: {r['current']} → {r['latest']} {r['url']}")
# Upgrade:
git -C {baseDir} pull
python3 {baseDir}/sync_sicry.py