InfoSeek

v2.0.0

Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily...

⭐ 0· 83·0 current·0 all-time

by@expeditionhub

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for expeditionhub/infoseek.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "InfoSeek" (expeditionhub/infoseek) from ClawHub.
Skill page: https://clawhub.ai/expeditionhub/infoseek
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install infoseek

ClawHub CLI

Package manager switcher

npx clawhub@latest install infoseek

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (deep web search + archival) align with the included helper script (URL normalization, SQLite deduplication, file storage) and the declared requirement of python3 and OPENCLAW_WORKSPACE. The script explicitly handles local file and DB operations and does not perform network searches itself, which fits the model where the agent or other 'search' skills perform crawling.

ℹ

Instruction Scope

SKILL.md instructs the agent to use external search/browser skills to fetch pages and to run the local helper script for normalization, deduplication, and saving. It does not instruct the agent to read arbitrary unrelated files or extra environment variables beyond OPENCLAW_WORKSPACE. Minor issues: inconsistent naming for required skills (e.g., 'tavily' vs 'tavily-search', 'Multi-Search-Engine' vs 'multi-search-engine') and a reliance on other skills being present in workspace/skills; these appear to be sloppy bookkeeping rather than malicious scope creep. Also, the workflow encourages high-volume scraping (e.g., '100+ pages' on Baidu) — a functional concern (rate limits, TOS, IP blocking, legal/ethical risk), not a code/credential mismatch.

✓

Install Mechanism

No install spec is provided (instruction-only skill with one helper script included). That is low-risk: nothing is downloaded from remote URLs and the script will only be written to the agent environment when this skill is installed. The helper script is plain Python, readable, and contains no obfuscated code or hidden remote endpoints.

✓

Credentials

The only declared primary credential is OPENCLAW_WORKSPACE (a workspace path used to store archives and check for other skills). No API keys or unrelated secrets are requested. The workspace access is necessary and proportionate for saving archives and database files.

✓

Persistence & Privilege

always is false (no forced always-on presence). The skill writes files and an SQLite DB under the workspace (expected for an archival tool) but does not request elevated system-wide configuration changes or access to other skills' configs.

Assessment

This skill appears to do what it says: it expects a workspace path and a readable/writable folder to store archives and an included Python helper script to manage deduplication and file storage. Before installing, consider: 1) Trust/source — the package has no homepage and an unknown source; review the full helper script yourself (it is included) and confirm you trust the publisher. 2) Dependencies — the skill expects other search/browser skills to exist in {workspace}/skills; ensure those are genuine and named exactly as SKILL.md expects (there are some naming mismatches in the instructions). 3) Legal & operational risk — the workflow encourages high-volume crawling (e.g., 100+ pages); ensure you comply with target sites' terms of service, robots.txt, and avoid overloading sites. 4) Workspace safety — the skill will create infoseek-archives/ and an SQLite DB under OPENCLAW_WORKSPACE; point OPENCLAW_WORKSPACE to an isolated location if you don't want data mixed with other agent state. 5) Rate limiting & secrets — the helper script does not exfiltrate data or call remote endpoints, but other search/browser skills might. Verify those dependent skills before use. If you want higher assurance, ask the publisher for a homepage or repository, or run the skill in a sandboxed workspace first.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3

Primary envOPENCLAW_WORKSPACE

latestvk978zzsxs5evanzq51fbznr67n84dmet

83downloads

0stars

1versions

Updated 3w ago

v2.0.0

MIT-0

InfoSeek - Deep Web Search & Archival

Overview

InfoSeek performs comprehensive web research on any subject (person, organization, product) across multiple search engines, deduplicates results, extracts clean content, and archives everything with full metadata in organized folders.

Prerequisites

Before executing a search task, verify these skills are installed:

import os
from pathlib import Path

workspace = os.environ.get('OPENCLAW_WORKSPACE')
skills_dir = Path(workspace) / 'skills'

required = ['baidu-search', 'tavily', 'Multi-Search-Engine', 'agent-browser-clawdbot-0.1.0']
missing = [s for s in required if not (skills_dir / s).exists()]

If any are missing, instruct the user to install them:

openclaw skills install baidu-search
openclaw skills install tavily-search
openclaw skills install multi-search-engine

Workflow

Phase 0: Task Setup

Confirm the search subject — name, organization, or product
Collect optional context — background info, time range, output format (default: .md), special requirements
Check dependencies — run the prerequisite check above

Create archive folder — run:

python scripts/infoseek_helper.py create-folder "<subject_name>"

Phase 1: Multi-Engine Deep Search

Execute searches across all available engines. Each engine runs independently.

1.1 Baidu Search (100+ pages)

Use the baidu-search skill:

Query: "<subject> <background_context>"
Depth: 100+ pages
Record: URL, title, website name, publish date for each result

1.2 Tavily Search

Use tavily_search tool:

query: "<subject> <background_context>"
search_depth: advanced
max_results: 50

1.3 Multi-Search-Engine

Use the multi-search-engine skill across multiple engines simultaneously.

1.4 Browser Deep-Crawl

For discovered URLs, use the browser tool to:

Open each page
Extract body content (filter ads, sidebars, comments)
Extract metadata: title, author, editor, date, website name

Phase 2: Deduplication

Run URL deduplication on all collected results:

python scripts/infoseek_helper.py deduplicate "<temp_results_file>"

The script normalizes URLs (remove www, tracking params, unify http/https, remove trailing slashes) and checks against the SQLite database to skip duplicates.

Phase 3: Content Extraction & Storage

For each unique URL:

Extract content using the browser tool — get title, body, metadata
Filter content — remove ads, sidebars, navigation, comments, related articles, footers

Generate filename:

python scripts/infoseek_helper.py generate-filename \
  --date "<YYYYMMDD>" --title "<title>" --website "<site>" --format "<ext>"

Format: YYYYMMDD-title-website.ext

Save the file:

python scripts/infoseek_helper.py save-content \
  --folder "<archive_path>" --filename "<name>" --url "<url>" \
  --website "<site>" --source "<source>" --date "<date>" \
  --title "<title>" --author "<author>" --editor "<editor>" \
  --content "<body>" --task "<subject>"

Record in database:

python scripts/infoseek_helper.py add-url \
  --url "<normalized_url>" --task "<subject>" --filename "<name>"

Phase 4: Task Report

Output a summary when complete:

InfoSeek Task Report
====================
Subject: {query}
Engines used: {engines}
Total found: {total} | Duplicates skipped: {dupes} | New archived: {new}
Files saved: {count}
Location: {path}
Database records: {db_total}

File Naming

Format: YYYYMMDD-title-website.ext

Date: 8 digits (YYYYMMDD) from page metadata
Title: page title (strip special chars <>:"/\|?*)
Website: domain or media name
Extension: md (default), json, txt, csv, xlsx, html, docx

If filename exists, append 8-char hash to prevent overwrites.

Output Formats

All formats include full metadata (URL, website, source, date, title, author, editor) plus body content.

.md — Markdown with metadata table
.json — Structured JSON with metadata object and content field
.txt — Plain text with header metadata
.csv — One row per article, all metadata as columns
.xlsx — Excel spreadsheet with metadata columns
.html — Styled HTML page with metadata table
.docx — Word document with metadata paragraph

Storage Structure

{workspace}/
├── infoseek-archives/
│   ├── <subject_1>/
│   │   ├── 20260404-title-website.md
│   │   └── ...
│   └── <subject_2>/
└── infoseek/
    ├── infoseek.db          # SQLite dedup database
    ├── infoseek.log         # Operation log
    └── backups/

Deletion Policy

Strict data retention — no permanent deletes without confirmation.

Operation	Confirmation	Method
Bulk folder delete	Required	Move to recycle bin
Single file delete	Required	Move to recycle bin
Dedup skip	Automatic	Skip only, no delete
Database cleanup	Required	Mark as deleted

Process:

List files to delete (name, URL, date)
Ask user: "Confirm deletion? Files go to recycle bin and can be recovered."
On confirmation, move to recycle bin (Windows: PowerShell, Mac/Linux: system trash)
Update database, log the deletion, confirm to user

Never:

Delete without user consent
Permanently delete (bypass recycle bin)
Delete without logging
Delete without updating database

Configuration

Override defaults in task instructions:

Search depth: default 100 pages, specify e.g. "150 pages"
Time range: default unlimited, specify e.g. "2020-01-01 to 2026-04-07"
Output format: default md, specify e.g. "xlsx"
Storage path: default {workspace}/infoseek-archives/, specify custom path

Troubleshooting

Problem	Solution
Missing search skill	`openclaw skills install <name>`
Date extraction fails	Check page metadata; use `00000000` for unknown
Encoding errors	Ensure UTF-8; on Windows enable Unicode UTF-8 in region settings
Database corruption	`python scripts/infoseek_helper.py restore-backup`

Security & Privacy

All searches use public channels only
No personal data stored — only search results
SQLite database is local, never uploaded
Deletions use system recycle bin (recoverable)
All operations logged and auditable
No telemetry, no external data transmission

Version History

Version	Date	Notes
2.0.0	2026-04-07	Full rewrite: SQLite dedup, URL normalization, HTML parsing, multi-engine integration
1.0.0	2026-04-06	Initial version (deprecated)

Comments

Loading comments...