Rag Evaluator

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like a ragaai catalyst, python, agentic-ai.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 20 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (Rag Evaluator) match the behavior: a small CLI that logs evaluation/benchmark/prompt data and provides exports, search, stats, and status. There are no unrelated env vars, binaries, or services requested.
Instruction Scope
Runtime instructions and the included script only read/write files in ~/.local/share/rag-evaluator and run standard Unix utilities. However, the tool logs arbitrary user-provided inputs (prompts, configs, costs, etc.) so users could accidentally store sensitive secrets or API keys. The JSON export implementation is brittle (naive string formatting, no escaping, and uses a literal "\n]" which may produce invalid JSON), so exported data may be malformed.
Install Mechanism
There is no install spec (instruction-only skill) which is low risk. A shell script is included; it is plain Bash and not obfuscated. Because there's no automatic installer, the script won't be placed on PATH by the platform unless the agent/user does so; inspect the script before running or install into an isolated environment if desired.
Credentials
The skill declares no required environment variables or credentials and the script only uses HOME and standard utilities. Requested accesses are proportional to a local logging CLI.
Persistence & Privilege
The script persists logs and exports under the user's home (~/.local/share/rag-evaluator) which is expected for this tool. It does not request elevated privileges, does not modify other skills, and always:false. Consider the privacy implications of storing inputs on disk.
Assessment
This skill appears coherent and local-only: it stores logs and exports under ~/.local/share/rag-evaluator and does not contact external endpoints or request credentials. Before installing or running it, review the included scripts (scripts/script.sh). Be aware that commands like `prompt`, `configure`, or `evaluate` log whatever you pass — avoid entering secrets or API keys into those entries. Exports and logs are stored on disk (they may grow over time), so consider disk space and file permissions. If you want extra safety, run the script in an isolated environment (container or dedicated user account) or modify DATA_DIR to point to a controlled location. Finally, note the JSON export is implemented naively and may produce invalid JSON for values containing quotes or newlines; treat exported files accordingly.

Like a lobster shell, security has layers — review code before you run it.

Current versionv2.0.0
Download zip
latestvk977gc82nfapnbervbp2nb9enx837zt9

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Rag Evaluator

AI-powered RAG (Retrieval-Augmented Generation) evaluation toolkit. Configure, benchmark, compare, and optimize your RAG pipelines from the command line. Track prompts, evaluations, fine-tuning experiments, costs, and usage — all with persistent local logging and full export capabilities.

Commands

Run rag-evaluator <command> [args] to use.

CommandDescription
configureConfigure RAG evaluation settings and parameters
benchmarkRun benchmarks against your RAG pipeline
compareCompare results across different RAG configurations
promptLog and manage prompt templates and variations
evaluateEvaluate RAG output quality and relevance
fine-tuneTrack fine-tuning experiments and parameters
analyzeAnalyze evaluation results and identify patterns
costTrack and log API/inference costs
usageMonitor token usage and API call volumes
optimizeLog optimization strategies and results
testRun test cases against RAG configurations
reportGenerate evaluation reports
statsShow summary statistics across all categories
export <fmt>Export data in json, csv, or txt format
search <term>Search across all logged entries
recentShow recent activity from history log
statusHealth check — version, data dir, disk usage
helpShow help and available commands
versionShow version (v2.0.0)

Each domain command (configure, benchmark, compare, etc.) works in two modes:

  • Without arguments: displays the most recent 20 entries from that category
  • With arguments: logs the input with a timestamp and saves to the category log file

Data Storage

All data is stored locally in ~/.local/share/rag-evaluator/:

  • Each command creates its own log file (e.g., configure.log, benchmark.log)
  • A unified history.log tracks all activity across commands
  • Entries are stored in timestamp|value pipe-delimited format
  • Export supports JSON, CSV, and plain text formats

Requirements

  • Bash 4+ with set -euo pipefail strict mode
  • Standard Unix utilities: date, wc, du, tail, grep, sed, cat
  • No external dependencies or API keys required

When to Use

  1. Evaluating RAG pipeline quality — log evaluation scores, compare retrieval strategies, and track improvements over time
  2. Benchmarking different configurations — run benchmarks across embedding models, chunk sizes, or retrieval methods and compare results side by side
  3. Tracking costs and usage — monitor API costs and token usage across experiments to stay within budget
  4. Managing prompt engineering — log prompt variations, test them against your pipeline, and analyze which templates perform best
  5. Generating reports for stakeholders — export evaluation data as JSON/CSV for dashboards, or generate text reports summarizing RAG performance

Examples

# Configure a new evaluation run
rag-evaluator configure "model=gpt-4 chunks=512 overlap=50 top_k=5"

# Run a benchmark and log results
rag-evaluator benchmark "latency=230ms recall@5=0.82 precision@5=0.71"

# Compare two retrieval strategies
rag-evaluator compare "bm25 vs dense: bm25 recall=0.78, dense recall=0.85"

# Track evaluation scores
rag-evaluator evaluate "faithfulness=0.91 relevance=0.87 coherence=0.93"

# Log API cost for a run
rag-evaluator cost "run-042: $0.23 (1.2k tokens input, 800 tokens output)"

# View summary statistics
rag-evaluator stats

# Export all data as CSV
rag-evaluator export csv

# Search for specific entries
rag-evaluator search "gpt-4"

# Check recent activity
rag-evaluator recent

# Health check
rag-evaluator status

Output

All commands output to stdout. Redirect to a file if needed:

rag-evaluator report "weekly summary" > report.txt
rag-evaluator export json  # saves to ~/.local/share/rag-evaluator/export.json

Configuration

Set DATA_DIR by modifying the script, or use the default: ~/.local/share/rag-evaluator/


Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…