data-quality-checker

v1.0.0

Validate CSV, JSON, and JSONL data files for quality issues. Detects missing values, duplicates, type inconsistencies, statistical outliers, format violation...

0· 75·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the included script: the code implements CSV/JSON/JSONL loading and the listed quality checks (missing, duplicates, types, outliers, formats, whitespace, empty, drift). No unrelated binaries, env vars, or services are requested.
Instruction Scope
SKILL.md instructs running the included Python script against local data files and generating reports; the instructions do not ask the agent to read unrelated system files, credentials, or transmit data externally.
Install Mechanism
No install spec (instruction-only + bundled script). This is low risk: nothing is downloaded or installed automatically by the skill.
Credentials
The skill declares no required environment variables or credentials and the visible code does not access environment secrets or configuration. No excessive permissions are requested.
Persistence & Privilege
The skill is not marked always:true and does not attempt to modify system or other skills' configurations in the visible code. Autonomous invocation is allowed (platform default) but not combined with other red flags.
Assessment
This skill appears to do what it says: run the included Python script on local CSV/JSON files to produce a data‑quality report. Before installing or running on sensitive data: 1) review the entire scripts/check_data_quality.py file — the listing you provided is truncated, and I couldn't inspect the file tail where networking or other behavior could appear; 2) run it first on non-sensitive sample data in an isolated environment; 3) check memory/CPU behavior on large files (the tool appears in-memory and may not stream very large datasets); 4) prefer installing skills from a known/published source (owner and homepage are unknown and STATUS.md notes a price), and 5) if you need to use it in CI on sensitive datasets, consider adding monitoring or sandboxing and/or reimplementing core checks within your vetted tooling.

Like a lobster shell, security has layers — review code before you run it.

latestvk97c3gp18qp236txa8wm7nxww184mmdb
75downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

Data Quality Checker

Validate CSV/JSON/JSONL data for quality issues. Pure Python, zero dependencies.

Quick Start

# Full quality check
python3 scripts/check_data_quality.py data.csv

# JSON/JSONL support
python3 scripts/check_data_quality.py data.json
python3 scripts/check_data_quality.py data.jsonl

# Markdown report
python3 scripts/check_data_quality.py data.csv --format markdown

# JSON report (for CI/CD)
python3 scripts/check_data_quality.py data.csv --format json

# Only specific checks
python3 scripts/check_data_quality.py data.csv --checks missing,duplicates,types

# Only warnings and critical
python3 scripts/check_data_quality.py data.csv --severity warning

# Save report
python3 scripts/check_data_quality.py data.csv --format markdown --output report.md

Schema Validation

# Generate schema from existing data
python3 scripts/check_data_quality.py data.csv --generate-schema schema.json

# Validate against schema
python3 scripts/check_data_quality.py data.csv --schema schema.json

Checks Performed

CheckDescriptionSeverity
missingMissing/null/empty values per columninfo → critical
duplicatesDuplicate rows and potential ID conflictswarning
typesMixed data types within columnsinfo → warning
outliersStatistical outliers via IQR methodinfo → warning
formatsEmail/phone/URL/date format violationswarning
whitespaceLeading/trailing whitespaceinfo
emptyEntirely empty columnswarning
driftExtra/missing keys across rows (schema drift)warning

Quality Score

0-100 score based on weighted severity:

  • 90-100: Clean data, minor issues
  • 70-89: Usable but needs attention
  • 50-69: Significant issues
  • 0-49: Critical problems

Exit Codes

  • 0 — No warnings or critical issues
  • 1 — Warnings found
  • 2 — Critical issues found

Use in CI: python3 scripts/check_data_quality.py data.csv || echo "Quality check failed"

Schema Format

JSON schema with validation rules:

{
  "required": ["id", "email", "name"],
  "properties": {
    "id": {"type": "integer", "minimum": 1},
    "email": {"type": "string", "pattern": "^[^@]+@[^@]+\\.[^@]+$"},
    "age": {"type": "number", "minimum": 0, "maximum": 150},
    "status": {"type": "string", "enum": ["active", "inactive", "pending"]}
  }
}

Comments

Loading comments...