{"skill":{"slug":"dataset-evaluation","displayName":"Dataset Evaluation","summary":"Evaluate a submission by scoring content consistency of texts and quality of structured data based on completeness, accuracy, type correctness, and informati...","description":"# SKILL.md --- dataset_evaluation\n\n## Skill Name\n\n`dataset_evaluation`\n\n## Description\n\nEvaluate a miner submission by performing two evaluation steps:\n\n1.  **Content Consistency Evaluation**\n2.  **Structured Data Quality Evaluation**\n\nThe evaluator receives **5 cleaned data samples**, the **structured\nJSON**, and the **dataset schema**, then computes a final score for the\nminer.\n\n------------------------------------------------------------------------\n\n# Input\n\n``` json\n{\n  \"cleaned_data_list\": [\n    \"cleaned_text_1\",\n    \"cleaned_text_2\",\n    \"cleaned_text_3\",\n    \"cleaned_text_4\",\n    \"cleaned_text_5\"\n  ],\n  \"structured_data\": {\n    \"field1\": \"value\",\n    \"field2\": \"value\"\n  },\n  \"dataset_schema\": {\n    \"fields\": [\n      {\"name\": \"title\", \"type\": \"string\", \"required\": true},\n      {\"name\": \"author\", \"type\": \"string\", \"required\": false},\n      {\"name\": \"date\", \"type\": \"string\", \"required\": false},\n      {\"name\": \"url\", \"type\": \"string\", \"required\": true}\n    ]\n  }\n}\n```\n\n------------------------------------------------------------------------\n\n# Evaluation Procedure\n\n## Step 1 --- Content Consistency Evaluation (Weight 40%)\n\nGoal: determine whether the **5 cleaned texts represent the same\nunderlying content**.\n\n### Method\n\n1.  Normalize text\n\n-   remove HTML\n-   lowercase\n-   remove excessive whitespace\n\n2.  Compute pairwise similarity across the 5 texts\n\nRecommended metrics:\n\n-   cosine similarity (embedding based)\n-   OR Jaccard similarity\n\n3.  Compute the **average similarity score**.\n\n### Output\n\n    content_consistency_score (0-100)\n\nSuggested mapping:\n\n    avg_similarity >= 0.9 → 100\n    0.8 – 0.9 → 80 – 100\n    0.6 – 0.8 → 60 – 80\n    0.4 – 0.6 → 40 – 60\n    < 0.4 → < 40\n\n------------------------------------------------------------------------\n\n# Step 2 --- Structured Data Quality Evaluation (Weight 60%)\n\nUsing the **verified cleaned content**, evaluate the **structured\nJSON**.\n\nCompute four sub-scores.\n\n------------------------------------------------------------------------\n\n## 2.1 Field Completeness (30%)\n\nEvaluate whether all **required fields** exist.\n\nFormula:\n\n    completeness_score =\n        (# required fields present / total required fields) * 100\n\n------------------------------------------------------------------------\n\n## 2.2 Value Accuracy (40%)\n\nEvaluate whether each field value is **consistent with the cleaned\ndata**.\n\nExamples:\n\n-   title appears in cleaned text\n-   author name appears in text\n-   url matches source\n\nScoring guideline:\n\n    exact match → 100\n    partially correct → 60-80\n    inconsistent → <50\n\n------------------------------------------------------------------------\n\n## 2.3 Type Correctness (15%)\n\nEvaluate whether values match schema types.\n\nExamples:\n\n    string\n    number\n    boolean\n    array\n\nFormula:\n\n    type_score =\n        (# correct types / total fields) * 100\n\n------------------------------------------------------------------------\n\n## 2.4 Information Sufficiency (15%)\n\nEvaluate whether the structured data **misses obvious information**\npresent in the cleaned text.\n\nExample:\n\nCleaned text contains:\n\n    title\n    author\n    date\n\nBut structured JSON only includes:\n\n    title\n\nThen deduct score.\n\nGuideline:\n\n    complete extraction → 100\n    minor missing info → 70–90\n    major missing info → <60\n\n------------------------------------------------------------------------\n\n# Structuring Quality Score\n\n    structuring_quality_score =\n        completeness_score * 0.30\n      + value_accuracy_score * 0.40\n      + type_score * 0.15\n      + information_sufficiency_score * 0.15\n\nRange:\n\n    0 – 100\n\n------------------------------------------------------------------------\n\n# Step 3 --- Final Miner Score\n\n    miner_score =\n        content_consistency_score * 0.4\n      + structuring_quality_score * 0.6\n\nRange:\n\n    0 – 100\n\n------------------------------------------------------------------------\n\n# Output Format\n\nThe evaluator must return:\n\n``` json\n{\n  \"content_consistency_score\": 92,\n  \"structuring_quality_score\": 85,\n  \"miner_score\": 88.2,\n  \"details\": {\n    \"completeness_score\": 90,\n    \"value_accuracy_score\": 88,\n    \"type_score\": 100,\n    \"information_sufficiency_score\": 80\n  }\n}\n```\n\n------------------------------------------------------------------------\n\n# Evaluator Rules\n\nThe evaluator **must follow these principles**:\n\n1.  Be deterministic and reproducible\n2.  Base judgments only on provided inputs\n3.  Avoid hallucination\n4.  Penalize missing or inconsistent data\n5.  Return scores strictly in the 0--100 range\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":522,"installsAllTime":1,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1773311735737,"updatedAt":1778491857969},"latestVersion":{"version":"1.0.0","createdAt":1773311735737,"changelog":"Initial release of dataset evaluation skill.\n\n- Implements a two-step evaluation: Content Consistency and Structured Data Quality.\n- Calculates a weighted final miner score based on both content and structuring assessments.\n- Evaluates JSON structure for field completeness, value accuracy, type correctness, and information sufficiency.\n- Provides a standardized output with detailed sub-scores.","license":"MIT-0"},"metadata":null,"owner":{"handle":"levey","userId":"s1716b2str5h8awnegs974rm8983j3j7","displayName":"levey","image":"https://avatars.githubusercontent.com/u/629136?v=4"},"moderation":null}