Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Siphonclaw Skill

v1.2.0

Hybrid document intelligence pipeline ingesting PDFs, images, and spreadsheets with OCR, visual and text search, and field fix capture for fast retrieval.

0· 676·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The README/SKILL.md describe a full document-intelligence pipeline (local Ollama models, ChromaDB/BM25, visual embeddings, and optional cloud intelligence via OpenRouter/Minimax/etc.). That functionality plausibly requires local binaries, model downloads, and API keys. However, the skill metadata declares no required environment variables, no required binaries, and no install steps. This inconsistency (documented runtime needs vs declared requirements) is unexplained and therefore concerning.
!
Instruction Scope
The SKILL.md lists tools that accept absolute file_path and image_path arguments (ingest/identify), and describes fallback to web search and cloud intelligence. That means the agent using this skill may read arbitrary local files for ingestion and may forward extracted text/images to external APIs. The instructions don’t explicitly limit which files/paths are used, nor do they declare the external endpoints or credentials in the registry. That broad scope (local file access + potential outbound transmission) without declared constraints is a risk.
Install Mechanism
There is no install spec in the registry (instruction-only skill), so nothing is automatically downloaded or written by the platform. This minimizes immediate install-time risk. However, the README shows manual install steps (git clone, pip install, ollama pull) that a user would run separately — those commands themselves fetch large models and code from third parties and should be verified before execution.
!
Credentials
The registry lists no required env vars, but the README and documentation reference many credentials and endpoints (examples: OLLAMA_URL/OLLAMA_VISION_MODEL, OPENROUTER_API_KEY, MINIMAX_API_KEY, KIMI_API_KEY, TELEGRAM_BOT_TOKEN, AGENTMAIL_API_KEY, BRAVE_SEARCH_API_KEY, DAILY_BUDGET_CAP). Requiring multiple unrelated API keys (messaging, search, model routers) would be proportional to the pipeline but the skill did not declare them in metadata. The absence of declared primary credentials while docs require secrets is an incoherence that could lead to accidental data exposure if users supply keys without understanding what will be sent where.
Persistence & Privilege
The skill is not marked always:true and does not request system-level config paths in the registry. Autonomous invocation is allowed (platform default) but that alone is not a red flag. There is no evidence this skill modifies other skills or system-wide settings. Still, because it can instruct ingestion of arbitrary files and outbound calls, consider limiting its access and running it in a sandbox until provenance is confirmed.
What to consider before installing
Do not install or run this skill blindly. The files describe downloads of large local models (Ollama pulls), a local vector DB, and use of multiple external APIs, yet the registry declares no required credentials—this mismatch is suspicious. Before using: 1) Verify the source repository (README references https://github.com/curtisgc1/siphonclaw.git) and review the actual code there; 2) If you plan to follow the README, inspect any scripts and requirements.txt for third-party packages and network calls; 3) Prefer running ingestion and model pulls on an isolated machine or VM (they download large models and will process local files); 4) Do not provide API keys (OpenRouter, Telegram, AgentMail, BraveSearch, etc.) until you confirm which endpoints will receive your data and why; 5) Consider restricting which filesystem paths the agent can access (avoid giving blanket access to / or home) and test with non-sensitive documents first. The lack of declared requirements and unknown provenance are the main reasons to proceed cautiously.

Like a lobster shell, security has layers — review code before you run it.

latestvk971jdw1nme1dpqn94fas1kzjh816xhg
676downloads
0stars
3versions
Updated 8h ago
v1.2.0
MIT-0

SiphonClaw

Domain-agnostic document intelligence pipeline. Ingest PDFs, images, and spreadsheets into a searchable knowledge base with dual-track retrieval (text + visual), OCR, confidence scoring, and field capture.

Built for field service engineers, researchers, mechanics, and anyone who needs fast answers from large document collections.

What SiphonClaw Does

  • Ingest documents (PDF, Excel, images, screenshots) into a local vector database with text and visual embeddings
  • Search using triple hybrid retrieval: BM25 keyword matching + semantic text vectors + visual page embeddings, fused with RRF and reranked with a cross-encoder
  • Identify equipment, parts, or components from photos using vision models, then search the local knowledge base
  • Capture field fixes and repair notes as first-class knowledge base entries for future retrieval
  • Score every response with composite confidence (retrieval + faithfulness + relevance + coverage) and footnote-style source citations

MCP Tools

SiphonClaw exposes five tools via MCP for integration with agents and other MCP-compatible clients.


siphonclaw_search

Search the knowledge base using triple hybrid retrieval (text + visual + keyword).

Parameters:

NameTypeRequiredDescription
querystringyesNatural language search query or exact part number / error code
top_kintegernoNumber of results to return (default: 5, max: 20)
filtersobjectnoMetadata filters (e.g., {"source_type": "service_manual", "model": "ModelA"})
modestringnoSearch mode: "hybrid" (default), "text", "visual", "keyword"

Returns:

{
  "results": [
    {
      "content": "Extracted text from the matching chunk or page",
      "source": "ServiceManual_ModelA.pdf",
      "page": 42,
      "section": "4.3 Transformer Replacement",
      "score": 0.92,
      "match_type": "hybrid"
    }
  ],
  "confidence": 0.87,
  "confidence_tier": "Confident - verify part number",
  "keywords_used": ["low voltage supply", "assembly mount", "ModelA"],
  "citations": ["[1] ServiceManual_ModelA, page 42", "[2] Parts Catalog PC-1102, page 15"]
}

siphonclaw_ingest

Add a document or photo to the knowledge base. Supports PDF, Excel, images (JPG/PNG), and screenshots.

Parameters:

NameTypeRequiredDescription
file_pathstringyesAbsolute path to the file to ingest
source_typestringnoDocument type hint: "manual", "parts_catalog", "field_note", "photo", "other" (default: auto-detect)
metadataobjectnoAdditional metadata to attach (e.g., {"model": "ModelA", "domain": "industrial"})

Returns:

{
  "status": "ingested",
  "file": "ServiceManual_ModelA.pdf",
  "pages_processed": 127,
  "chunks_created": 843,
  "visual_pages_indexed": 127,
  "ocr_pages": 12,
  "duration_seconds": 45.2
}

siphonclaw_field_note

Save a field fix or repair note as a first-class knowledge base entry. These are indexed and retrievable in future searches, forming a learning loop.

Parameters:

NameTypeRequiredDescription
notestringyesFree-text description of the fix, procedure, or observation
modelstringnoEquipment model or identifier (e.g., "ModelA")
partsarray[string]noPart numbers used in the repair (e.g., ["12345", "67890"])
procedure_refstringnoReference to a manual procedure (e.g., "ServiceManual_ModelA section 4.3")
tagsarray[string]noFree-form tags for categorization (e.g., ["hv_transformer", "calibration"])

Returns:

{
  "status": "saved",
  "field_note_id": "fn-2026-02-09-001",
  "indexed": true,
  "model": "ModelA",
  "parts_cross_referenced": ["12345"],
  "retrievable": true
}

siphonclaw_identify

Send a photo of equipment, a part, a label, or an error screen. SiphonClaw uses vision models to identify what it sees, then searches the local knowledge base for relevant documentation. Falls back to web search if local confidence is low.

Parameters:

NameTypeRequiredDescription
image_pathstringyesAbsolute path to the image file (JPG, PNG, HEIC)
contextstringnoAdditional context about the image (e.g., "circuit board inside equipment housing")
search_afterbooleannoAutomatically search the KB after identification (default: true)

Returns:

{
  "identification": "Industrial power supply board, Model PSU-200",
  "visual_features": ["green PCB", "3 large capacitors", "manufacturer logo visible", "part label partially obscured"],
  "ocr_text": "PSU-200 REV C  SN: 4829103",
  "search_results": [
    {
      "content": "PSU-200 replacement procedure...",
      "source": "ServiceManual_ModelA.pdf",
      "page": 67,
      "score": 0.94
    }
  ],
  "confidence": 0.91,
  "web_search_used": false
}

siphonclaw_status

Get pipeline health, ingestion statistics, model availability, and cost tracking.

Parameters:

NameTypeRequiredDescription
detailstringnoLevel of detail: "summary" (default), "full", "costs", "models"

Returns:

{
  "status": "healthy",
  "knowledge_base": {
    "total_documents": 3164,
    "total_chunks": 656000,
    "visual_pages_indexed": 31200,
    "last_ingestion": "2026-02-09T14:30:00Z"
  },
  "models": {
    "ocr": {"model": "qwen3-vl:latest", "provider": "ollama", "available": true},
    "text_embedding": {"model": "bge-m3:latest", "provider": "ollama", "available": true},
    "visual_embedding": {"model": "qwen3-vl-embed:2b", "provider": "ollama", "available": true},
    "generation": {"model": "MiniMax-M2.5", "provider": "openrouter", "available": true},
    "reasoning": {"model": "kimi-k2.5", "provider": "openrouter", "available": true},
    "fallback": {"model": "glm-4.7-flash:latest", "provider": "ollama", "available": true}
  },
  "costs": {
    "today": "$0.12",
    "this_month": "$2.45",
    "daily_budget": "$5.00",
    "budget_remaining": "$4.88"
  },
  "dead_letter_queue": {
    "pending_retry": 2,
    "permanently_failed": 1
  }
}

MCP Server

SiphonClaw runs as an MCP server that any MCP-compatible client (OpenClaw agents, Claude Desktop, etc.) can connect to.

# Start the MCP server (stdio transport - default for OpenClaw)
python mcp_server.py

# Start with SSE transport (for HTTP-based clients)
python mcp_server.py --sse --port 8000

OpenClaw agent config (~/.openclaw/openclaw.json):

{
  "mcpServers": {
    "siphonclaw": {
      "command": "python",
      "args": ["mcp_server.py"],
      "cwd": "/path/to/siphonclaw"
    }
  }
}

Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "siphonclaw": {
      "command": "python",
      "args": ["/path/to/siphonclaw/mcp_server.py"]
    }
  }
}

Setup

Mode A: Hybrid Local + Cloud (Recommended)

Local models handle ingestion (OCR + embeddings) for free. Cloud APIs handle intelligence (generation + reasoning) for pennies per query.

Monthly cost: ~$0.50-5/mo for typical use.

# 1. Install SiphonClaw
git clone https://github.com/curtisgc1/siphonclaw.git && cd siphonclaw
pip install -r requirements.txt

# 2. Install Ollama and pull local models (~10 GB total)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-vl:latest          # 6.1 GB - OCR
ollama pull bge-m3:latest             # ~1.5 GB - text embeddings
ollama pull qwen3-vl-embed:2b        # ~2 GB - visual embeddings

# 3. Get OpenRouter API key (ONE key for all intelligence models)
#    Visit: https://openrouter.ai -> Sign up -> Copy API key
siphonclaw config set openrouter_key sk-or-v1-xxxxx

# 4. (Optional) Get Brave Search API key for web search fallback
#    Visit: https://brave.com/search/api -> Sign up -> Free tier: 2,000 queries/mo
siphonclaw config set brave_key BSA-xxxxx

# 5. Point to your documents and ingest
siphonclaw config set docs_path /path/to/my/docs
siphonclaw ingest

# 6. Search
siphonclaw search "part number for compressor valve"

Mode B: Full Cloud

Everything runs via OpenRouter. Simpler setup (no Ollama needed), but ingestion of large document sets costs $50-100+ in API tokens.

First month: ~$50-105. After that: ~$0.50/mo.

# 1. Install SiphonClaw
pip install siphonclaw

# 2. Get OpenRouter API key
siphonclaw config set openrouter_key sk-or-v1-xxxxx

# 3. Set ingestion mode to cloud
siphonclaw config set ingestion_mode cloud

# 4. (Optional) Get Brave Search API key
siphonclaw config set brave_key BSA-xxxxx

# 5. Point to your documents and ingest
siphonclaw config set docs_path /path/to/my/docs
siphonclaw ingest

# 6. Search
siphonclaw search "part number for compressor valve"

Cost Comparison

OperationMode A (Hybrid)Mode B (Full Cloud)
Ingest 3,000 PDFs$0 (local)~$50-100 (OCR + embeddings)
100 searches/month~$0.50 (API generation)~$0.50 (same)
Monthly total~$0.50-5/mo~$50-105 first month, $0.50/mo after

Configuration Reference

SiphonClaw reads configuration from config/models.yaml and environment variables.

Environment variables (via .env or shell):

VariableRequiredDescription
OPENROUTER_API_KEYMode A/BOpenRouter API key for intelligence models
BRAVE_SEARCH_API_KEYnoBrave Search API key for web search fallback
OLLAMA_BASE_URLnoOllama server URL (default: http://127.0.0.1:11434)
SIPHONCLAW_BUDGET_DAILYnoDaily API spend cap in USD (default: 5.00)
SIPHONCLAW_DOCS_PATHnoPath to document directory for ingestion

Agent config example (config.json):

{
  "skills": {
    "entries": {
      "siphonclaw": {
        "openrouter_key": "sk-or-v1-xxxxx",
        "brave_key": "BSA-xxxxx",
        "docs_path": "/path/to/docs",
        "ingestion_mode": "local",
        "ollama_url": "http://127.0.0.1:11434"
      }
    }
  }
}

Model configuration: See config/models.yaml for full model tier configuration with ingestion and intelligence settings.

Comments

Loading comments...