Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

arxivkb

v1.0.1

Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud...

0· 661·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for camopel/arxivkb.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "arxivkb" (camopel/arxivkb) from ClawHub.
Skill page: https://clawhub.ai/camopel/arxivkb
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3, ollama
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install arxivkb

ClawHub CLI

Package manager switcher

npx clawhub@latest install arxivkb
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill name/description align with the included scripts: it crawls arXiv, downloads PDFs, extracts and chunks text, embeds via Ollama (nomic-embed-text) and indexes with FAISS/SQLite. Required binaries (python3, ollama) match the design.
Instruction Scope
Runtime instructions and code operate within the declared purpose (arXiv API + local embedding). However, SKILL.md/README claim defaults and behaviors that do not fully match the code: SKILL.md says default data dir is `~/workspace/arxivkb`, while install.py/cli/db default to `~/Downloads/ArXivKB`. SKILL.md and README mention a `config.json` and `akb` CLI wrapper; the installer writes service/plist that references `--config {config.json}` but the installer does not create that config file or an 'akb' executable in PATH. These mismatches can cause unexpected file placement and failing background jobs.
Install Mechanism
The registry entry has no formal install spec, but a provided scripts/install.py will run pip installs and call `ollama pull`. That installer will (if executed) pip-install packages (possibly using --user), pull a model from Ollama (network download), create data directories, and write systemd/launchd files. No unusual remote or obfuscated download URLs are used, but the install script performs network operations and writes persistent service files to the user's profile.
Credentials
No secrets or cloud API keys are requested. The only external endpoints contacted are arXiv (public) and a local Ollama server (http://localhost:11434). An optional env var ARXIVKB_DATA_DIR is supported for data directory override. No unrelated credentials or config paths are requested.
!
Persistence & Privilege
The installer writes user-level service definitions (systemd timer in ~/.config/systemd/user and launchd plist in ~/Library/LaunchAgents) to schedule daily crawls. This creates persistent background network activity (periodic arXiv downloads and embedding). While expected for a crawler, users should be aware this grants the skill ongoing presence on the host. always:false mitigates global forced inclusion, but the installer still modifies user startup/service configuration.
What to consider before installing
This package appears to be what it says — a local arXiv crawler with FAISS search — but it has a few sloppy/inconsistent implementation details and will install persistent background jobs. Before running the installer or giving it shell access: 1) Inspect scripts/install.py and the generated systemd/launchd files (it writes to ~/.config/systemd/user and ~/Library/LaunchAgents) and confirm you want a daily background ingest. 2) Note the data-directory mismatch: SKILL.md/README mention ~/workspace/arxivkb but the scripts use ~/Downloads/ArXivKB; set ARXIVKB_DATA_DIR or edit the defaults to control where PDFs/DB/index are stored. 3) The systemd/launchd service references a --config {config.json} that the installer does not create — background runs may fail unless you create/populate that config or adapt the service. 4) The installer will pip-install packages and run `ollama pull nomic-embed-text` (model download) — expect network activity and non-trivial disk usage. 5) Run the installer inside a virtual environment if you want to avoid global/user pip changes. 6) Ensure Ollama is installed and intentionally run as it will accept local HTTP requests; embedding calls target localhost only. If you want higher assurance, run the tool manually (invoke scripts/cli.py directly) instead of activating the installer’s automatic timer, and verify paths and config behavior first.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3, ollama
latestvk97bwy48av0qcaff398dj5mgd581m1wg
661downloads
0stars
2versions
Updated 14h ago
v1.0.1
MIT-0

ArXivKB — Science Knowledge Base

Why This Skill?

🏠 100% local — crawls arXiv's free API, embeds with Ollama (nomic-embed-text), indexes in FAISS + SQLite. No cloud cost.

🔍 Semantic search on paper content — FAISS indexes PDF chunks (not just abstracts), so you find papers by what they contain.

📂 arXiv category-based — tracks official arXiv categories (155 available, 8 groups). No free-text queries.

🧹 Auto-cleanup — configurable expiry deletes old papers, PDFs, and chunks.

Install

python3 scripts/install.py

Works on macOS and Linux. Installs Python deps (faiss-cpu, pdfplumber, tiktoken, arxiv, numpy), pulls nomic-embed-text via Ollama, creates data directories and DB.

Prerequisites

  • Ollama — must be installed and running (ollama serve)
  • Python 3.10+

Quick Start

# 1. Add arXiv categories to track
akb categories add cs.AI cs.CV cs.LG

# 2. Browse all available categories
akb categories browse

# 3. Ingest recent papers (last 7 days)
akb ingest

# 4. Check stats
akb stats

Categories

akb categories list                    # Show enabled categories
akb categories browse                  # Browse all 155 arXiv categories
akb categories browse robotics         # Filter by keyword
akb categories add cs.AI cs.RO         # Enable categories
akb categories delete cs.AI            # Disable a category

Categories are official arXiv codes (e.g. cs.AI, eess.IV, q-fin.ST). The full taxonomy is built in.

Ingestion

akb ingest                    # Crawl, download PDFs, chunk, embed
akb ingest --days 14          # Look back 14 days
akb ingest --dry-run          # Preview only
akb ingest --no-pdf           # Index abstracts only (faster)

Pipeline: arXiv API → PDF download → text extraction (pdfplumber) → chunking (tiktoken, 500 tokens, 50 overlap) → embedding (Ollama nomic-embed-text) → FAISS + SQLite.

Paper Details

akb paper 2401.12345    # Show title, abstract, categories, PDF status

Statistics

akb stats   # Papers, chunks, categories, DB size

Expiry & Cleanup

akb expire               # Delete papers older than 90 days (default)
akb expire --days 30     # Override: delete papers older than 30 days
akb expire --days 30 -y  # Skip confirmation

Configuration

No config file needed. Defaults:

SettingDefaultOverride
Data directory~/workspace/arxivkbARXIVKB_DATA_DIR env or --data-dir
Ollama endpointhttp://localhost:11434— (hardcoded)
Embedding modelnomic-embed-text (768d)— (hardcoded)
Chunk size500 tokens, 50 overlap
Expiry90 days--days flag

Data Layout

~/workspace/arxivkb/
├── arxivkb.db           # SQLite: papers, chunks, translations, categories
├── pdfs/                  # Downloaded PDF files ({arxiv_id}.pdf)
└── faiss/
    └── arxivkb.faiss    # FAISS IndexFlatIP (chunk embeddings)

DB Schema

  • papers: id, arxiv_id, title, abstract, categories, published, status, created_at
  • chunks: id, paper_id, section, chunk_index, text, faiss_id, created_at
  • translations: paper_id, language, abstract, created_at (PK: paper_id+language)
  • categories: code, description, group_name, enabled, added_at (155 entries)

💬 Chat Commands (OpenClaw Agent)

When this skill is installed, the agent recognizes /akb as a shortcut:

CommandAction
/akb listShow enabled categories
/akb add cs.AI cs.ROEnable categories for crawling
/akb remove cs.AIDisable a category
/akb browseBrowse all 155 arXiv categories
/akb browse roboticsFilter categories by keyword
/akb statsShow paper/chunk/category counts
/akb helpShow available commands

The agent runs these via the akb CLI internally.

📱 PrivateApp Dashboard

A companion PWA dashboard is available. Provides:

  • Semantic search across paper content
  • Paper detail with abstract translation (on-demand via LLM)
  • Inline PDF viewing
  • Category browser
  • Stats (papers, chunks, categories)

Architecture

scripts/
├── cli.py             # CLI — categories, ingest, paper, stats, expire
├── db.py              # SQLite schema + CRUD
├── arxiv_crawler.py   # arXiv API search + PDF download
├── arxiv_taxonomy.py  # Full arXiv category taxonomy (155 categories)
├── pdf_processor.py   # PDF text extraction + tiktoken chunking
├── embed.py           # Ollama nomic-embed-text (768d, normalized)
├── faiss_index.py     # FAISS IndexFlatIP manager
├── search.py          # Semantic search: query → FAISS → group by paper
└── install.py         # One-command installer

Comments

Loading comments...