dingo data quality

Evaluate AI training and RAG data quality using rule-based or LLM-based metrics with Dingo's flexible, multi-format assessment framework and CLI/SDK support.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
3 · 95 · 0 current installs · 0 all-time installs
bychupei@e06084
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (data quality / RAG evaluation) match the SKILL.md, _meta.json, and the included fact_check.py script. The package asks for an LLM API key only for LLM-based evaluators or the ArticleFactChecker flow, which is appropriate for the stated functionality.
Instruction Scope
SKILL.md instructs using local dataset files and CLI/SDK calls and shows configs that reference only the expected inputs (data files, evaluator configs, optional API keys and endpoints). The script reads input articles and writes temporary JSONL and output artifacts; it explicitly blocks special system paths and symlinks. No instructions attempt to read unrelated system secrets or send data to unexpected endpoints beyond the LLM/search APIs you configure.
Install Mechanism
This is instruction-only (no install spec). SKILL.md recommends installing the published dingo-python package via pip (a standard registry install). No downloads from arbitrary URLs or archives are present in the skill bundle itself.
Credentials
LLM-based evaluation requires an OpenAI-compatible API key (OPENAI_API_KEY) and optionally OPENAI_BASE_URL, OPENAI_MODEL, and TAVILY_API_KEY for web search — all proportional to an LLM-driven fact-checker. Note: registry 'required env vars' field is empty but the code and docs clearly treat OPENAI_API_KEY as required for LLM flows; this is expected but worth confirming before enabling LLM mode.
Persistence & Privilege
Skill does not request always: true, does not modify other skills or global settings, and is not installing persistent background agents. Autonomous invocation is allowed (default) but not combined with other red flags.
Assessment
This skill appears to do what it says: rule-based checks run offline without keys, while LLM-based/fact-checking requires an OpenAI-compatible API key (OPENAI_API_KEY) and optionally a search API key (TAVILY_API_KEY). Before using LLM mode: (1) confirm you trust the dingo-python package source on PyPI or the linked GitHub repo, (2) provide API keys only for providers you intend to bill, (3) be aware the tool will send article text and extracted claims to whatever API_BASE_URL you configure, so verify that endpoint, and (4) you can use rule-based evaluators without giving any credentials. If you want extra assurance, inspect the upstream dingo-python package code and confirm network behavior and telemetry before supplying secrets.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.4
Download zip
latestvk97f29by9zyj3q24astdb0ygbx83hrg5

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Data Quality Evaluation with Dingo

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool.

Installation

pip install dingo-python

Optional extras

pip install "dingo-python[agent]"    # Agent-based evaluation (fact-checking)
pip install "dingo-python[hhem]"     # HHEM hallucination detection
pip install "dingo-python[all]"      # Everything

Verify installation

python -c "from dingo.config import InputArgs; print('Dingo OK')"

Two evaluation modes

Rule-basedLLM-based
API key requiredNoYes (any OpenAI-compatible API)
SpeedFastSlower (API calls)
CostZeroPer-token cost
Metrics50+ deterministic rulesText quality, RAG, 3H, security
Best forFormat checks, PII, completenessSemantic quality, faithfulness

Core workflow

  1. Prepare data: JSONL, JSON, CSV, plaintext, or Parquet file
  2. Choose evaluators: Rule-based (free, fast) or LLM-based (semantic understanding)
  3. Run evaluation: CLI with config file or Python SDK
  4. Review results: summary.json + per-item JSONL reports in output directory

CLI Usage

Dingo CLI takes a JSON config file as input:

dingo eval --input config.json

Minimal rule-based config

{
  "input_path": "data.jsonl",
  "dataset": {"source": "local", "format": "jsonl"},
  "evaluator": [
    {
      "fields": {"content": "content"},
      "evals": [
        {"name": "RuleColonEnd"},
        {"name": "RuleSpecialCharacter"},
        {"name": "RuleContentNull"}
      ]
    }
  ]
}

LLM-based config

{
  "input_path": "data.jsonl",
  "dataset": {"source": "local", "format": "jsonl"},
  "evaluator": [
    {
      "fields": {"content": "content"},
      "evals": [
        {
          "name": "LLMTextRepeat",
          "config": {
            "model": "deepseek-chat",
            "key": "${OPENAI_API_KEY}",
            "api_url": "https://api.deepseek.com/v1"
          }
        }
      ]
    }
  ]
}

RAG evaluation config

RAG evaluation requires specific fields mapped from the dataset:

{
  "input_path": "rag_output.jsonl",
  "dataset": {"source": "local", "format": "jsonl"},
  "evaluator": [
    {
      "fields": {
        "user_input": "user_input",
        "response": "response",
        "retrieved_contexts": "retrieved_contexts",
        "reference": "reference"
      },
      "evals": [
        {"name": "Faithfulness", "config": {"model": "deepseek-chat", "key": "${OPENAI_API_KEY}", "api_url": "https://api.deepseek.com/v1"}},
        {"name": "ContextPrecision", "config": {"model": "deepseek-chat", "key": "${OPENAI_API_KEY}", "api_url": "https://api.deepseek.com/v1"}}
      ]
    }
  ]
}

Multi-field evaluation config

Evaluate different columns with different rules:

{
  "input_path": "qa_data.jsonl",
  "dataset": {"source": "local", "format": "jsonl"},
  "evaluator": [
    {
      "fields": {"content": "answer"},
      "evals": [{"name": "RuleColonEnd"}, {"name": "RuleSpecialCharacter"}]
    },
    {
      "fields": {"content": "question"},
      "evals": [{"name": "RuleContentNull"}]
    }
  ]
}

SDK Usage

For programmatic use inside Python scripts:

from dingo.config import InputArgs
from dingo.exec import Executor

if __name__ == '__main__':
    input_data = {
        "input_path": "data.jsonl",
        "dataset": {"source": "local", "format": "jsonl"},
        "evaluator": [
            {
                "fields": {"content": "content"},
                "evals": [
                    {"name": "RuleColonEnd"},
                    {"name": "RuleSpecialCharacter"}
                ]
            }
        ]
    }
    input_args = InputArgs(**input_data)
    executor = Executor.exec_map["local"](input_args)
    result = executor.execute()
    print(result)

Config reference

Dataset configuration

FieldValuesDescription
sourcelocal, huggingface, s3, sqlData source type
formatjsonl, json, csv, plaintext, parquetFile format

Executor configuration

FieldDefaultDescription
max_workers1Parallel evaluation workers
batch_size10Items per batch
result_save.badtrueSave items that fail evaluation
result_save.goodfalseSave items that pass evaluation
result_save.mergefalseMerge all results into single file

Evaluator configuration

Each evaluator group has:

FieldRequiredDescription
fieldsYesMaps Dingo fields to dataset columns
evalsYesList of evaluators to apply
evals[].nameYesEvaluator class name
evals[].configFor LLMLLM config: model, key, api_url

Field mapping

The fields object maps Dingo's internal field names to your dataset's column names:

Dingo fieldDescriptionUsed by
contentMain text content to evaluateMost rule/LLM evaluators
promptInstruction/question fieldInstruction quality evaluators
imageImage path or URLVLM evaluators
user_inputUser queryRAG evaluators
responseModel responseRAG evaluators
retrieved_contextsRetrieved context listRAG evaluators
referenceGround truth referenceRAG evaluators

Available evaluators

Rule-based (no API key needed)

CategoryExamples
Content checksRuleContentNull, RuleContentShort, RuleDocRepeat
Format checksRuleColonEnd, RuleSpecialCharacter, RuleAbnormalChar
Quality checksRuleLongWord, RuleHighPPL, RulePunctuation
PII detectionRulePII, RuleUrl, RuleEmail
LanguageRuleChineseChaos, RuleChineseTraditional

LLM-based (requires API key)

CategoryEvaluators
Text qualityLLMTextRepeat, LLMTextQualityV5
RAG metricsFaithfulness, ContextPrecision, ContextRecall, AnswerRelevancy, ContextRelevancy
SafetyLLMSecurityProhibition
3H evaluationLLMText3HHelpful, LLMText3HHarmless, LLMText3HHonest

Agent-based (requires pip install "dingo-python[agent]")

EvaluatorDescription
ArticleFactCheckerAutonomous fact-checking with ArXiv/web search tools

Output structure

Dingo writes results to an output directory:

outputs/<timestamp>/
├── summary.json                    # Overall statistics
└── <field_group>/
    ├── QUALITY_BAD/
    │   ├── RULE_COLON_END.jsonl    # Failed items by metric
    │   └── ...
    └── QUALITY_GOOD/
        └── ...                     # Passed items (if result_save.good=true)

summary.json format

{
  "task_name": "...",
  "total_count": 100,
  "good_count": 85,
  "bad_count": 15,
  "good_ratio": 0.85,
  "metric_detail": {
    "RuleColonEnd": {"count": 5, "ratio": 0.05},
    "RuleSpecialCharacter": {"count": 10, "ratio": 0.1}
  }
}

Environment variables

VariableDescription
OPENAI_API_KEYAPI key for LLM-based evaluation
OPENAI_BASE_URLCustom API endpoint (default: https://api.openai.com/v1)
OPENAI_MODELModel name (default: gpt-4)

Supported input formats

FormatExtensionDescription
JSONL.jsonlOne JSON object per line (recommended)
JSON.jsonArray of objects or single object
CSV.csvComma-separated values
Plaintext.txtOne item per line
Parquet.parquetApache Parquet columnar format

General rules

When using this skill on behalf of the user:

  • Always write a config file before running CLI evaluation. Don't try to pass complex JSON inline.
  • Quote file paths with spaces in commands: dingo eval --input "my config.json"
  • Wrap main code in if __name__ == '__main__': when writing Python scripts — Dingo uses multiprocessing internally, which fails on macOS without this guard.
  • Infer format from extension: .jsonljsonl, .jsonjson, .csvcsv, .txtplaintext.
  • Default to rule-based when the user doesn't specify evaluation type — it's free, fast, and needs no API key.
  • Ask for API key before using LLM-based evaluators. Never hardcode keys in config files; use ${OPENAI_API_KEY} placeholder or environment variables.
  • Check field names in the user's data before writing config. The fields mapping must match actual column names in the dataset.

Choosing evaluators

  1. User wants basic quality checks → Use rule-based evaluators (e.g., RuleColonEnd, RuleContentNull, RuleSpecialCharacter)
  2. User wants semantic quality assessment → Use LLM-based evaluators (e.g., LLMTextQualityV5, LLMTextRepeat)
  3. User wants RAG pipeline evaluation → Use RAG metrics (Faithfulness, ContextPrecision, ContextRecall, AnswerRelevancy). Requires user_input, response, retrieved_contexts, reference fields.
  4. User wants fact-checking → Use ArticleFactChecker (requires dingo-python[agent] extra)
  5. User wants safety/content moderation → Use LLMSecurityProhibition
  6. User doesn't know what to check → Start with common rule checks, show the summary, then suggest LLM-based evaluators if needed.

Post-evaluation guidance

After evaluation completes, the agent should:

  1. Read summary.json and report the key metrics: total items, good/bad counts, good ratio
  2. If there are failures, briefly explain what each failing metric means
  3. Suggest next steps (e.g., "15% of items have colon-ending issues — you may want to clean those")

MCP Server (AI Agent Integration)

Dingo includes a built-in MCP (Model Context Protocol) server, allowing AI agents (Cursor, Claude Desktop, etc.) to invoke Dingo's evaluation tools directly.

Start the server

# SSE transport (default, for Cursor / remote agents)
dingo serve

# Custom port
dingo serve --port 9000

# stdio transport (for Claude Desktop / local agent spawn)
dingo serve --transport stdio

Configure your AI agent

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "dingo": {
      "url": "http://localhost:8000/sse"
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "dingo": {
      "command": "dingo",
      "args": ["serve", "--transport", "stdio"],
      "env": {
        "OPENAI_API_KEY": "your-key",
        "OPENAI_MODEL": "gpt-4o"
      }
    }
  }
}

Available MCP tools

ToolDescription
run_dingo_evaluationRun rule or LLM evaluation on a file
list_dingo_componentsList rule groups, LLM models, prompts
get_rule_detailsGet details about a specific rule
get_llm_detailsGet details about a specific LLM evaluator
get_prompt_detailsGet embedded prompt for an LLM
run_quick_evaluationGoal-based evaluation (auto-infer settings)

For detailed MCP documentation, see: https://github.com/MigoXLab/dingo/blob/main/README_mcp.md

Troubleshooting

  • ModuleNotFoundError: No module named 'dingo': Run pip install dingo-python (note: the package name is dingo-python, not dingo)
  • RuntimeError: An attempt has been made to start a new process...: Wrap your code in if __name__ == '__main__': — required on macOS due to multiprocessing
  • LLM evaluation returns errors: Check that OPENAI_API_KEY is set and api_url is correct
  • Empty results: Verify fields mapping matches your dataset's actual column names
  • RAG metrics all fail: Ensure your data has all required fields: user_input, response, retrieved_contexts, reference

Notes

  • Dingo supports any OpenAI-compatible API (OpenAI, DeepSeek, Anthropic via proxy, local vLLM, etc.)
  • Rule-based evaluators run locally with zero API cost
  • Results are written to the outputs/ directory by default (timestamped subdirectories)
  • The content field is the most commonly mapped field — it's the main text that most evaluators check

Resources


Fact-Checking Articles with ArticleFactChecker

ArticleFactChecker extracts all verifiable claims from an article and verifies each one using ArXiv academic search and web search. It runs as an autonomous agent and produces a structured verification report.

Prerequisites

pip install "dingo-python[agent]"
python3 -c "from dingo.config import InputArgs; print('Dingo OK')"

Required: OPENAI_API_KEY Optional (recommended for web search): TAVILY_API_KEY

Quick start — use the bundled script

The skill includes scripts/fact_check.py which handles all input preparation and configuration automatically:

python3 {baseDir}/scripts/fact_check.py path/to/article.md

Supported input formats: .md, .txt (auto-wrapped), .jsonl, .json

Optional arguments:

  • --model MODEL — LLM model (default: env OPENAI_MODEL or gpt-5.4-mini)
  • --max-claims N — claims to extract, 1–200 (default: 50)
  • --max-concurrent N — parallel verification slots, 1–20 (default: 5)

The script outputs structured JSON to stdout. Parse and present:

  • accuracy_score (0.0–1.0): fraction of claims verified true
  • false_claims: list of contradicted claims with evidence
  • all_claims: full breakdown with TRUE/FALSE/UNVERIFIABLE verdicts

Manual SDK usage

For direct SDK integration without the script:

import json, os, tempfile
from dingo.config import InputArgs
from dingo.exec import Executor

# IMPORTANT: wrap article into JSONL — plaintext is read line-by-line otherwise
article_text = open("article.md", encoding="utf-8").read()
tmp = tempfile.NamedTemporaryFile(mode="w", suffix=".jsonl", delete=False, encoding="utf-8")
tmp.write(json.dumps({"content": article_text}, ensure_ascii=False) + "\n")
tmp.close()

config = {
    "input_path": tmp.name,
    "dataset": {"source": "local", "format": "jsonl"},
    "executor": {"max_workers": 1},
    "evaluator": [{
        "fields": {"content": "content"},
        "evals": [{
            "name": "ArticleFactChecker",
            "config": {
                "key": os.environ["OPENAI_API_KEY"],
                "model": os.getenv("OPENAI_MODEL", "gpt-5.4-mini"),
                "api_url": os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1"),
                "parameters": {
                    "temperature": 0,
                    "agent_config": {
                        "max_concurrent_claims": 5,
                        "max_iterations": 50,
                        "tools": {
                            "claims_extractor": {
                                "api_key": os.environ["OPENAI_API_KEY"],
                                "model": os.getenv("OPENAI_MODEL", "gpt-5.4-mini"),
                                "base_url": os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1"),
                                "max_claims": 50
                            },
                            "arxiv_search": {"max_results": 5},
                            **({"tavily_search": {"api_key": os.environ["TAVILY_API_KEY"]}}
                               if os.getenv("TAVILY_API_KEY") else {})
                        }
                    }
                }
            }
        }]
    }]
}

if __name__ == "__main__":
    result = Executor.exec_map["local"](InputArgs(**config)).execute()
    print(f"Score: {result.score:.1f}%  |  Output: {result.output_path}")
    os.unlink(tmp.name)

Key requirement: Always use if __name__ == "__main__": when running Dingo with multiprocessing — required on macOS, recommended everywhere.

Interpreting the output

The summary.json in the output directory contains overall stats. Detailed per-claim results are in content/QUALITY_BAD_*.jsonl (for articles with false claims).

Each result item's eval_details.content[0] has:

  • score: accuracy_score (0.0–1.0, ratio of verified-true claims)
  • reason[0]: human-readable text summary
  • reason[1]: full structured report dict with detailed_findings and false_claims_comparison

For advanced configuration (model selection, claim types, tuning), see references/advanced-config.md.

Files

5 total
Select a file
Select a file to preview.

Comments

Loading comments…