InfoSeek

v2.0.0

Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily...

0· 83·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for expeditionhub/infoseek.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "InfoSeek" (expeditionhub/infoseek) from ClawHub.
Skill page: https://clawhub.ai/expeditionhub/infoseek
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install infoseek

ClawHub CLI

Package manager switcher

npx clawhub@latest install infoseek
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (deep web search + archival) align with the included helper script (URL normalization, SQLite deduplication, file storage) and the declared requirement of python3 and OPENCLAW_WORKSPACE. The script explicitly handles local file and DB operations and does not perform network searches itself, which fits the model where the agent or other 'search' skills perform crawling.
Instruction Scope
SKILL.md instructs the agent to use external search/browser skills to fetch pages and to run the local helper script for normalization, deduplication, and saving. It does not instruct the agent to read arbitrary unrelated files or extra environment variables beyond OPENCLAW_WORKSPACE. Minor issues: inconsistent naming for required skills (e.g., 'tavily' vs 'tavily-search', 'Multi-Search-Engine' vs 'multi-search-engine') and a reliance on other skills being present in workspace/skills; these appear to be sloppy bookkeeping rather than malicious scope creep. Also, the workflow encourages high-volume scraping (e.g., '100+ pages' on Baidu) — a functional concern (rate limits, TOS, IP blocking, legal/ethical risk), not a code/credential mismatch.
Install Mechanism
No install spec is provided (instruction-only skill with one helper script included). That is low-risk: nothing is downloaded from remote URLs and the script will only be written to the agent environment when this skill is installed. The helper script is plain Python, readable, and contains no obfuscated code or hidden remote endpoints.
Credentials
The only declared primary credential is OPENCLAW_WORKSPACE (a workspace path used to store archives and check for other skills). No API keys or unrelated secrets are requested. The workspace access is necessary and proportionate for saving archives and database files.
Persistence & Privilege
always is false (no forced always-on presence). The skill writes files and an SQLite DB under the workspace (expected for an archival tool) but does not request elevated system-wide configuration changes or access to other skills' configs.
Assessment
This skill appears to do what it says: it expects a workspace path and a readable/writable folder to store archives and an included Python helper script to manage deduplication and file storage. Before installing, consider: 1) Trust/source — the package has no homepage and an unknown source; review the full helper script yourself (it is included) and confirm you trust the publisher. 2) Dependencies — the skill expects other search/browser skills to exist in {workspace}/skills; ensure those are genuine and named exactly as SKILL.md expects (there are some naming mismatches in the instructions). 3) Legal & operational risk — the workflow encourages high-volume crawling (e.g., 100+ pages); ensure you comply with target sites' terms of service, robots.txt, and avoid overloading sites. 4) Workspace safety — the skill will create infoseek-archives/ and an SQLite DB under OPENCLAW_WORKSPACE; point OPENCLAW_WORKSPACE to an isolated location if you don't want data mixed with other agent state. 5) Rate limiting & secrets — the helper script does not exfiltrate data or call remote endpoints, but other search/browser skills might. Verify those dependent skills before use. If you want higher assurance, ask the publisher for a homepage or repository, or run the skill in a sandboxed workspace first.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3
Primary envOPENCLAW_WORKSPACE
latestvk978zzsxs5evanzq51fbznr67n84dmet
83downloads
0stars
1versions
Updated 3w ago
v2.0.0
MIT-0

InfoSeek - Deep Web Search & Archival

Overview

InfoSeek performs comprehensive web research on any subject (person, organization, product) across multiple search engines, deduplicates results, extracts clean content, and archives everything with full metadata in organized folders.

Prerequisites

Before executing a search task, verify these skills are installed:

import os
from pathlib import Path

workspace = os.environ.get('OPENCLAW_WORKSPACE')
skills_dir = Path(workspace) / 'skills'

required = ['baidu-search', 'tavily', 'Multi-Search-Engine', 'agent-browser-clawdbot-0.1.0']
missing = [s for s in required if not (skills_dir / s).exists()]

If any are missing, instruct the user to install them:

openclaw skills install baidu-search
openclaw skills install tavily-search
openclaw skills install multi-search-engine

Workflow

Phase 0: Task Setup

  1. Confirm the search subject — name, organization, or product
  2. Collect optional context — background info, time range, output format (default: .md), special requirements
  3. Check dependencies — run the prerequisite check above
  4. Create archive folder — run:
    python scripts/infoseek_helper.py create-folder "<subject_name>"
    

Phase 1: Multi-Engine Deep Search

Execute searches across all available engines. Each engine runs independently.

1.1 Baidu Search (100+ pages)

Use the baidu-search skill:

  • Query: "<subject> <background_context>"
  • Depth: 100+ pages
  • Record: URL, title, website name, publish date for each result

1.2 Tavily Search

Use tavily_search tool:

query: "<subject> <background_context>"
search_depth: advanced
max_results: 50

1.3 Multi-Search-Engine

Use the multi-search-engine skill across multiple engines simultaneously.

1.4 Browser Deep-Crawl

For discovered URLs, use the browser tool to:

  1. Open each page
  2. Extract body content (filter ads, sidebars, comments)
  3. Extract metadata: title, author, editor, date, website name

Phase 2: Deduplication

Run URL deduplication on all collected results:

python scripts/infoseek_helper.py deduplicate "<temp_results_file>"

The script normalizes URLs (remove www, tracking params, unify http/https, remove trailing slashes) and checks against the SQLite database to skip duplicates.

Phase 3: Content Extraction & Storage

For each unique URL:

  1. Extract content using the browser tool — get title, body, metadata
  2. Filter content — remove ads, sidebars, navigation, comments, related articles, footers
  3. Generate filename:
    python scripts/infoseek_helper.py generate-filename \
      --date "<YYYYMMDD>" --title "<title>" --website "<site>" --format "<ext>"
    
    Format: YYYYMMDD-title-website.ext
  4. Save the file:
    python scripts/infoseek_helper.py save-content \
      --folder "<archive_path>" --filename "<name>" --url "<url>" \
      --website "<site>" --source "<source>" --date "<date>" \
      --title "<title>" --author "<author>" --editor "<editor>" \
      --content "<body>" --task "<subject>"
    
  5. Record in database:
    python scripts/infoseek_helper.py add-url \
      --url "<normalized_url>" --task "<subject>" --filename "<name>"
    

Phase 4: Task Report

Output a summary when complete:

InfoSeek Task Report
====================
Subject: {query}
Engines used: {engines}
Total found: {total} | Duplicates skipped: {dupes} | New archived: {new}
Files saved: {count}
Location: {path}
Database records: {db_total}

File Naming

Format: YYYYMMDD-title-website.ext

  • Date: 8 digits (YYYYMMDD) from page metadata
  • Title: page title (strip special chars <>:"/\|?*)
  • Website: domain or media name
  • Extension: md (default), json, txt, csv, xlsx, html, docx

If filename exists, append 8-char hash to prevent overwrites.

Output Formats

All formats include full metadata (URL, website, source, date, title, author, editor) plus body content.

  • .md — Markdown with metadata table
  • .json — Structured JSON with metadata object and content field
  • .txt — Plain text with header metadata
  • .csv — One row per article, all metadata as columns
  • .xlsx — Excel spreadsheet with metadata columns
  • .html — Styled HTML page with metadata table
  • .docx — Word document with metadata paragraph

Storage Structure

{workspace}/
├── infoseek-archives/
│   ├── <subject_1>/
│   │   ├── 20260404-title-website.md
│   │   └── ...
│   └── <subject_2>/
└── infoseek/
    ├── infoseek.db          # SQLite dedup database
    ├── infoseek.log         # Operation log
    └── backups/

Deletion Policy

Strict data retention — no permanent deletes without confirmation.

OperationConfirmationMethod
Bulk folder deleteRequiredMove to recycle bin
Single file deleteRequiredMove to recycle bin
Dedup skipAutomaticSkip only, no delete
Database cleanupRequiredMark as deleted

Process:

  1. List files to delete (name, URL, date)
  2. Ask user: "Confirm deletion? Files go to recycle bin and can be recovered."
  3. On confirmation, move to recycle bin (Windows: PowerShell, Mac/Linux: system trash)
  4. Update database, log the deletion, confirm to user

Never:

  • Delete without user consent
  • Permanently delete (bypass recycle bin)
  • Delete without logging
  • Delete without updating database

Configuration

Override defaults in task instructions:

  • Search depth: default 100 pages, specify e.g. "150 pages"
  • Time range: default unlimited, specify e.g. "2020-01-01 to 2026-04-07"
  • Output format: default md, specify e.g. "xlsx"
  • Storage path: default {workspace}/infoseek-archives/, specify custom path

Troubleshooting

ProblemSolution
Missing search skillopenclaw skills install <name>
Date extraction failsCheck page metadata; use 00000000 for unknown
Encoding errorsEnsure UTF-8; on Windows enable Unicode UTF-8 in region settings
Database corruptionpython scripts/infoseek_helper.py restore-backup

Security & Privacy

  • All searches use public channels only
  • No personal data stored — only search results
  • SQLite database is local, never uploaded
  • Deletions use system recycle bin (recoverable)
  • All operations logged and auditable
  • No telemetry, no external data transmission

Version History

VersionDateNotes
2.0.02026-04-07Full rewrite: SQLite dedup, URL normalization, HTML parsing, multi-engine integration
1.0.02026-04-06Initial version (deprecated)

Comments

Loading comments...