Research Library
v0.1.0Local-first multimedia research library for hardware projects. Capture code, CAD, PDFs, images. Search with material-type weighting. Project isolation with cross-references. Async extraction. Backup + restore.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (local multimedia research library) align with the files and CLI: SQLite + FTS5, extractors for PDFs/images/code, async workers, backup/restore, and project scoping. The included modules (cli, extractor, search, worker, database) match the stated functionality.
Instruction Scope
The SKILL.md and other docs instruct the agent/user to import local files, run extraction, and store DB/backups under a home-path (e.g. ~/.openclaw/research). That is expected for a local-first tool, but the skill will read and copy arbitrary files the user points it at and will create a local DB and attachments directory. The docs reference environment variables (RESLIB_DATA_DIR / RESLIB_DB) and CLI options for overriding paths; SKILL.md does not request any unrelated files or cloud endpoints. Recommendation: review and set data-dir before bulk imports if you want to control where files are written.
Install Mechanism
The registry shows no automated install spec in the skill bundle, but the package includes full Python code, an entry_point, and a _meta.json with dependencies (pdfplumber, pytesseract, click). SKILL.md suggests pip install /path/to/research-library or clawhub install. No remote downloads or odd URLs were present in the provided files. Minor packaging inconsistency: the skill is described as 'instruction-only' but contains code and a package manifest.
Credentials
The skill does not request credentials or secrets and does not declare required environment variables in the registry. Docs/CLI mention optional env vars (RESLIB_DATA_DIR, RESLIB_DB) and dependencies (pytesseract plus the system tesseract binary). The only notable mismatch: system 'tesseract-ocr' is an optional runtime dependency referenced in docs but not declared as a required binary in the registry. No credentials (API keys, AWS, etc.) are requested.
Persistence & Privilege
The skill is not always: true and does not request special platform privileges. It stores data in user-visible locations (default ~/.openclaw/research/) and creates backups there; that is consistent with a local-first CLI tool. Autonomous invocation is enabled by default (standard behavior) but combined with no broad credentials or network endpoints, the risk is limited to local file operations.
Assessment
This skill appears to be what it says: a local CLI research library that indexes files you add and stores a SQLite DB and attachments locally. Before installing or running it: 1) Review and if desired override the default data/db path (RESLIB_DATA_DIR or --db) so imports and backups go to a directory you control; 2) install optional OCR prerequisites (system tesseract-ocr) only if you need OCR; 3) because the skill bundle contains executable Python code from an unknown source (no homepage/repo owner is authoritative in the metadata), consider inspecting reslib/cli.py and reslib/extractor.py for any unexpected network calls or behavior and run the package in an isolated environment or container for initial testing; 4) run the included test suite (pytest) or smoke_test.sh in a sandbox before pointing it at large or sensitive directories; 5) there are no requested credentials, but verify there are no hardcoded endpoints in the code if you want to ensure data never leaves your machine.Like a lobster shell, security has layers — review code before you run it.
latest
Research Library Skill
A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.
What It Does
- Store documents — Code, PDFs, CAD files, images, schematics
- Extract automatically — Text from PDFs, EXIF from images, functions from code
- Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
- Project isolation — Arduino separate from CNC; no contamination
- Cross-reference — Link knowledge: "this servo tuning applies to that project"
- Async extraction — Searches never block while OCR runs
- Backup daily — 30-day rolling snapshots
Installation
clawhub install research-library
# OR
pip install /path/to/research-library
Quick Start
# Initialize database
reslib status
# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
# Search
reslib search "servo tuning"
# Link knowledge
reslib link 5 12 --type applies_to
Features
CLI Commands
reslib add— Import documents (auto-detect + extract)reslib search— Full-text search with filtersreslib get— View document detailsreslib archive/reslib unarchive— Manage documentsreslib export— Export as JSON/Markdownreslib link— Create document relationshipsreslib projects— Manage projectsreslib tags— Manage tagsreslib status— System overviewreslib backup/reslib restore— Snapshotsreslib smoke_test.sh— Quick validation
Technical
- Storage: SQLite 3.45+ with FTS5 virtual table
- Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
- Confidence Scoring: 0.0-1.0 based on quality + source
- Material Weighting: Reference (1.0) vs Research (0.5)
- Project Isolation: Scoped searches, no contamination
- Async Workers: 2-4 configurable extraction workers
- Catalog Separation: real_world vs openclaw projects
- Backup: Daily snapshots, 30-day retention
Configuration
Copy reslib/config.json and customize:
{
"db_path": "~/.openclaw/research/library.db",
"num_workers": 2,
"worker_timeout_sec": 300,
"max_retries": 3,
"backup_retention_days": 30,
"backup_dir": "~/.openclaw/research/backups",
"file_size_limit_mb": 200,
"project_size_limit_gb": 2
}
Integration with War Room
Use RL1 protocol in war room DNA:
from reslib import ResearchDatabase, ResearchSearch
db = ResearchDatabase()
search = ResearchSearch(db)
# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
print(f"Found {len(prior)} prior items")
else:
# New research needed...
db.add_research(title="...", content="...", ...)
Performance
All targets exceeded:
| Operation | Target | Actual |
|---|---|---|
| PDF extraction | <100ms | 20.6ms |
| Search (50 docs) | <100ms | 0.33ms |
| Worker throughput | >6/sec | 414.69/sec |
Testing
# Run all tests
pytest tests/
# Quick smoke test
bash reslib/smoke_test.sh
# Performance tests
pytest tests/test_integration.py -v -k stress
Known Limitations (Phase 2)
- OCR quality varies on hand-drawn sketches
- FTS5 designed for <10K documents (PostgreSQL path for scale)
- No automatic web research gathering (manual only)
- Vector embeddings ready but inactive
- CAD file parsing is metadata-only
Documentation
See /docs/:
CLI-REFERENCE.md— All commands + examplesEXTRACTION-GUIDE.md— How extraction worksSEARCH-GUIDE.md— Ranking + weightingWORKER-GUIDE.md— Async queue detailsINTEGRATION.md— War room RL1 protocol
Phase 2 Roadmap
- Real-world PDF calibration
- FTS5 scaling tests (10K docs)
- Auto-detection (reference vs research)
- Web research enrichment
- Vector embeddings (semantic search)
- PostgreSQL upgrade path
Building From Source
cd research-library
pip install -e .
pytest tests/
python -m reslib status
Support
Issues? See TECHNICAL-NOTES.md for troubleshooting.
Production-ready MVP. 214 tests passing. 15K lines. Ready to use.
Comments
Loading comments...
