Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Sci Data Extractor
v0.1.0AI-powered tool for extracting structured data from scientific literature PDFs
⭐ 0· 398·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description match the code and docs: the project extracts text from PDFs (PyMuPDF or Mathpix) and sends content to an LLM to produce structured outputs. That capability legitimately requires an LLM API key and optionally Mathpix credentials. However, the registry metadata declares no required environment variables or primary credential while the SKILL.md and code clearly expect EXTRACTOR_API_KEY (or API_KEY), EXTRACTOR_BASE_URL, and optional MATHPIX_APP_ID / MATHPIX_APP_KEY — the missing declaration in registry is an inconsistency and reduces transparency.
Instruction Scope
Runtime instructions and code will read local PDFs and .env, upload PDFs to Mathpix if chosen, and send extracted text to an external LLM endpoint. That is coherent with the stated purpose, but it does mean entire document content (potentially sensitive or copyrighted material) is transmitted to third-party services. The SKILL.md also suggests running external install scripts (see next dimension).
Install Mechanism
There is no formal install spec in the registry, but the SKILL.md recommends installing the 'uv' tool via curl -LsSf https://astral.sh/uv/install.sh | sh which runs a remote install script — a higher-risk pattern. The README also suggests adding the skill via npx or cloning a GitHub repo. Running an arbitrary curl|sh should be treated cautiously; the project otherwise relies on pip packages listed in requirements.txt (reasonable).
Credentials
The code requires an LLM API key and optionally Mathpix credentials (EXTRACTOR_API_KEY or API_KEY, EXTRACTOR_BASE_URL, MATHPIX_APP_ID/KEY). Those are proportionate for an extractor. The problem: the registry metadata lists no required env vars, creating a transparency gap. Also the README/SKILL.md default model is a Claude model name while the code uses the openai.OpenAI client and accepts EXTRACTOR_BASE_URL — this mismatch (client vs declared model/provider) is suspicious and should be verified before supplying keys.
Persistence & Privilege
The skill does not request always:true and does not claim to modify other skills or persistent system settings. It's a user-invoked tool and its runtime behavior is limited to reading local PDFs, optional .env, and making network calls to configured LLM/Mathpix endpoints.
What to consider before installing
What to check before installing or running this skill:
- Origin: The skill's Source/Homepage are unknown; prefer code from a trusted repository. If you got this from an external repo, inspect the repo and maintainer reputation.
- API keys: The code will send extracted text (potentially entire PDF contents) to external LLMs/Mathpix. Only use API keys with limited scope or billing controls, and avoid uploading sensitive or private documents.
- Registry mismatch: The registry lists no required env vars but the SKILL.md and code require EXTRACTOR_API_KEY (or API_KEY), EXTRACTOR_BASE_URL and optionally Mathpix keys. Do not provide secrets until you confirm how they are used and where traffic goes.
- LLM/provider inconsistency: The README defaults to a Claude model name but the code uses the openai Python client and a configurable base_url. Verify that the client and base_url will actually work with your provider; otherwise keys might be misdirected or fail.
- Avoid running curl | sh blindly: The installer suggests running an external script (https://astral.sh/uv/install.sh). Do not run that unless you trust the source—prefer to install uv/venv tooling via package manager or inspect the script first.
- Sandbox test: Run the tool in a disposable environment (VM/container) first, with a throwaway API key and non-sensitive PDFs. Monitor network requests during a test run to confirm endpoints and data sent.
- Code review focus: The key network actions are in extractor.py (requests to Mathpix and the OpenAI client usage). Confirm there are no hidden endpoints or telemetry sending keys elsewhere. If you are not comfortable, do not provide production API keys.
If you want, I can point out the exact lines in the code that perform the network calls and the places where environment variables are read, or produce a minimal checklist for a safe sandboxed test run.Like a lobster shell, security has layers — review code before you run it.
latestvk979bc8tew8m2pdrgex7h0nbqs81wc6x
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
