data-quality-checker
v1.0.0Validate CSV, JSON, and JSONL data files for quality issues. Detects missing values, duplicates, type inconsistencies, statistical outliers, format violation...
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description match the included script: the code implements CSV/JSON/JSONL loading and the listed quality checks (missing, duplicates, types, outliers, formats, whitespace, empty, drift). No unrelated binaries, env vars, or services are requested.
Instruction Scope
SKILL.md instructs running the included Python script against local data files and generating reports; the instructions do not ask the agent to read unrelated system files, credentials, or transmit data externally.
Install Mechanism
No install spec (instruction-only + bundled script). This is low risk: nothing is downloaded or installed automatically by the skill.
Credentials
The skill declares no required environment variables or credentials and the visible code does not access environment secrets or configuration. No excessive permissions are requested.
Persistence & Privilege
The skill is not marked always:true and does not attempt to modify system or other skills' configurations in the visible code. Autonomous invocation is allowed (platform default) but not combined with other red flags.
Assessment
This skill appears to do what it says: run the included Python script on local CSV/JSON files to produce a data‑quality report. Before installing or running on sensitive data: 1) review the entire scripts/check_data_quality.py file — the listing you provided is truncated, and I couldn't inspect the file tail where networking or other behavior could appear; 2) run it first on non-sensitive sample data in an isolated environment; 3) check memory/CPU behavior on large files (the tool appears in-memory and may not stream very large datasets); 4) prefer installing skills from a known/published source (owner and homepage are unknown and STATUS.md notes a price), and 5) if you need to use it in CI on sensitive datasets, consider adding monitoring or sandboxing and/or reimplementing core checks within your vetted tooling.Like a lobster shell, security has layers — review code before you run it.
latest
Data Quality Checker
Validate CSV/JSON/JSONL data for quality issues. Pure Python, zero dependencies.
Quick Start
# Full quality check
python3 scripts/check_data_quality.py data.csv
# JSON/JSONL support
python3 scripts/check_data_quality.py data.json
python3 scripts/check_data_quality.py data.jsonl
# Markdown report
python3 scripts/check_data_quality.py data.csv --format markdown
# JSON report (for CI/CD)
python3 scripts/check_data_quality.py data.csv --format json
# Only specific checks
python3 scripts/check_data_quality.py data.csv --checks missing,duplicates,types
# Only warnings and critical
python3 scripts/check_data_quality.py data.csv --severity warning
# Save report
python3 scripts/check_data_quality.py data.csv --format markdown --output report.md
Schema Validation
# Generate schema from existing data
python3 scripts/check_data_quality.py data.csv --generate-schema schema.json
# Validate against schema
python3 scripts/check_data_quality.py data.csv --schema schema.json
Checks Performed
| Check | Description | Severity |
|---|---|---|
missing | Missing/null/empty values per column | info → critical |
duplicates | Duplicate rows and potential ID conflicts | warning |
types | Mixed data types within columns | info → warning |
outliers | Statistical outliers via IQR method | info → warning |
formats | Email/phone/URL/date format violations | warning |
whitespace | Leading/trailing whitespace | info |
empty | Entirely empty columns | warning |
drift | Extra/missing keys across rows (schema drift) | warning |
Quality Score
0-100 score based on weighted severity:
- 90-100: Clean data, minor issues
- 70-89: Usable but needs attention
- 50-69: Significant issues
- 0-49: Critical problems
Exit Codes
0— No warnings or critical issues1— Warnings found2— Critical issues found
Use in CI: python3 scripts/check_data_quality.py data.csv || echo "Quality check failed"
Schema Format
JSON schema with validation rules:
{
"required": ["id", "email", "name"],
"properties": {
"id": {"type": "integer", "minimum": 1},
"email": {"type": "string", "pattern": "^[^@]+@[^@]+\\.[^@]+$"},
"age": {"type": "number", "minimum": 0, "maximum": 150},
"status": {"type": "string", "enum": ["active", "inactive", "pending"]}
}
}
Comments
Loading comments...
