Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

rag-eval

v1.2.1

Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision).

2· 610·0 current·0 all-time
byJonathan Jing@jonathanjing
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (RAG evaluation with Ragas) aligns with code and instructions. Declared required binaries (python3, pip), optional env vars (OPENAI/ANTHROPIC/RAGAS_LLM), and the included scripts (run_eval.py, batch_eval.py, setup.sh) are all appropriate for performing LLM-judged RAG evals.
Instruction Scope
SKILL.md instructs the agent to accept question/answer/contexts, write a temp JSON file, and call the provided Python scripts; it explicitly warns against shell-injecting user content. The scripts only reference expected files/paths (memory/eval-results) and expected env vars. No instructions request unrelated system data or unrelated credentials.
Install Mechanism
No registry install spec is provided; the included scripts/setup.sh installs dependencies via pip from public PyPI packages (ragas, datasets, langchain integrations). This is expected for a Python tool but may modify system Python if a virtualenv isn't used. No downloads from untrusted URLs or URL-shortened installers were found.
Credentials
Requested environment access is limited to LLM-related keys and optional RAGAS_* tuning variables. These are justified by the skill's need to call an LLM judge and (optionally) embeddings. No unrelated secrets or multiple unrelated service credentials are requested.
Persistence & Privilege
The skill does not request always:true and does not modify other skills. It persists evaluation outputs under memory/eval-results (expected for a reporting tool). The setup script may install packages on the host environment but does not request elevated system privileges.
Assessment
This skill appears to do what it claims, but take these precautions before installing or running it: 1) Inspect the included scripts locally (scripts/run_eval.py, scripts/batch_eval.py, scripts/setup.sh) — don't run arbitrary shell scripts without review. 2) Use a Python virtual environment (python -m venv .venv; source .venv/bin/activate) before running setup.sh to avoid global pip installs. 3) Protect your LLM keys — the skill uses OPENAI_API_KEY/ANTHROPIC_API_KEY to call remote LLMs; grant least-privilege keys where possible and monitor usage. 4) The tool writes evaluation files to memory/eval-results in the working directory; verify this location suits your data-retention policies. 5) There is a truncated/possibly buggy section in the provided run_eval.py excerpt (the sample here was truncated) — ensure you have the complete, reviewed script before running explain/advanced features. 6) Expect runtime costs for LLM judge calls. If you need higher assurance, ask for a line-by-line code review or a reproducible test run in an isolated environment.

Like a lobster shell, security has layers — review code before you run it.

latestvk97481nq3v3jmnv6rjhtwqfz0s82984e

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🧪 Clawdis
Any binpython3, pip

Comments