Doc Scraper
v1.0.0Documentation extraction and indexing. Extracts information from markdown files and syncs to workspace-db. Works alongside workspace-db which handles synchro...
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
SKILL.md describes a CLI (doc-scraper) and JS API (docScraper.extract*/workspaceDb.sync*) and shows commands like doc-scraper watch --dir, but the skill bundle has no install spec, no binaries, and no code implementing those commands. That mismatch (declared purpose vs. provided artifacts) is incoherent: either the author assumed external tools exist on the host or omitted the implementation.
Instruction Scope
Instructions instruct recursively scanning and watching directories (skills/, docs/, workspace/) and indexing arbitrary markdown files into docs.db. While consistent with a documentation indexer, this scope can capture sensitive files (README, config snippets, credentials if present in docs) and the SKILL.md gives the agent discretion to extract various headers and configs without constraints. There are no limits or filters described.
Install Mechanism
No install spec is provided (instruction-only), which is lowest-risk from a supply-chain perspective—but it's unexpected given the CLI/API usage in the instructions. If the agent tries to run the named CLI and it does not exist, behavior will depend on the agent's error-handling; if the CLI does exist in the environment (from elsewhere), the skill will call it without presenting its code here.
Credentials
The skill declares no required environment variables or credentials (which is reasonable), but it explicitly integrates with workspace-db and writes to docs.db. That integration may implicitly require workspace-db credentials or network access (not declared). The absence of declared credentials makes it unclear how workspace sync is authenticated—this is an information gap worth clarifying.
Persistence & Privilege
always is false (good). However, the SKILL.md encourages long-running watch processes (doc-scraper watch) which would give the agent persistent access to file-system changes while running. Autonomous invocation is allowed by default; combined with watch-style behavior this increases blast radius if the agent is permitted to run watchers without restrictions.
What to consider before installing
This skill is instruction-only and does not include the doc-scraper implementation it describes. Before installing or enabling it: 1) Ask the author for the implementation or an install spec (how is doc-scraper provided?). 2) Verify workspace-db: where docs.db is stored and whether any credentials/network endpoints are used. 3) Limit scanning scope—do not allow the skill to watch your entire workspace; specify allowed directories only. 4) Prefer manual invocation until you can review the underlying code; avoid granting always:true or leaving autonomous long-running watchers enabled. 5) If you must run it, run in a sandboxed/least-privilege environment and audit the files that get indexed to ensure no secrets are captured.Like a lobster shell, security has layers — review code before you run it.
documentationextractionindexinglatest
Doc Scraper
Dokumentations-Extraktion und Indexierung - arbeitet mit workspace-db zusammen.
Zusammenspiel mit workspace-db
| Skill | Aufgabe | Datenbank |
|---|---|---|
| workspace-db | Synchronisation & Organisation | docs.db |
| doc-scraper (dieser) | Informationsextraktion | Nutzt docs.db |
Aufgaben
1. Markdown-Extraktion
// Extrahiert aus SKILL.md:
// - Name, Version, Beschreibung
// - Nutzungsbeispiele
// - Konfigurationsoptionen
const docInfo = await docScraper.extractMarkdown({
file: "skills/my-skill/SKILL.md",
extract: ["title", "description", "usage", "config"]
});
2. Indexierung in docs.db
// Speichert extrahierte Daten in docs.db
// (workspace-db verwaltet die DB)
await docScraper.index({
source: "skills/my-skill/SKILL.md",
data: docInfo,
tags: ["skill", "api"]
});
3. Auto-Update bei Änderungen
# Überwacht .md Dateien
# Extrahiert bei Änderung neu
# Aktualisiert docs.db
doc-scraper watch --dir skills/ --ext .md
Extraktions-Templates
Skill-Dokumentation
# Aus SKILL.md extrahiert:
name: "skill-name"
description: "Beschreibung"
version: "1.0.0"
category: "database"
usage_examples:
- command: "openclaw skill"
result: "..."
API-Dokumentation
# Aus API.md extrahiert:
endpoints:
- path: "/api/v1/search"
method: "GET"
params:
- query: string
response: json
System-Dokumentation
# Aus SYSTEM.md extrahiert:
components:
- databases:
- docs.db
- tree.db
cron_jobs:
- db-maintainer: "*/30"
Workflow
skill.md geändert
↓
doc-scraper erkennt Änderung
↓
Extrahiert: name, desc, usage, config
↓
Speichert in docs.db
↓
workspace-db synchronisiert
Nutzung
Einmalig
doc-scraper index --dir skills/ --recursive
doc-scraper index --dir docs/ --ext .md
Watch-Modus
# Kontinuierlich überwachen
doc-scraper watch --dir workspace/
# Einzelne Datei
doc-scraper watch --file README.md
Suche
# Direkt in extrahierten Daten suchen
doc-scraper search --query "database"
doc-scraper search --tag "api" --format json
Integration mit workspace-db
// doc-scraper extrahiert
// workspace-db speichert/organisiert
const extracted = await docScraper.extract('skills/my/SKILL.md');
// Übergabe an workspace-db
await workspaceDb.syncDocument({
id: extracted.name,
category: extracted.category,
data: extracted,
source_file: 'skills/my/SKILL.md'
});
Konfiguration
{
"doc-scraper": {
"watch_dirs": ["skills/", "docs/"],
"extensions": [".md", ".mdx"],
"extract_headers": ["##", "###"],
"auto_index": true,
"workspace_db_integration": true
}
}
Links
Comments
Loading comments...
