Data Toolkit

Complete data conversion, validation, and cleaning toolkit. Convert between JSON/CSV/YAML/XML, validate schemas, clean duplicates and nulls. Essential utilities for data processing workflows.

Audits

Pass

Install

openclaw skills install data-toolkit

Data Toolkit

Complete data processing utilities for OpenClaw agents.

Features

Converters

  • JSON ↔ CSV - Bidirectional conversion with schema inference
  • JSON ↔ YAML - Clean formatting, comment preservation
  • JSON ↔ XML - Configurable root elements and attributes
  • CSV ↔ YAML - Direct conversion without intermediate steps
  • Multi-format batch conversion - Process entire directories

Validators

  • JSON Schema validation - Validate against JSON Schema specs
  • CSV structure validation - Check headers, columns, data types
  • Data type inference - Automatic type detection and validation
  • Custom rules - Define business logic validations

Cleaners

  • Duplicate removal - Smart deduplication with configurable keys
  • Null/empty handling - Remove or replace null values
  • Data normalization - Standardize formats (dates, numbers, strings)
  • Whitespace cleanup - Trim, collapse multiple spaces
  • Column operations - Remove, rename, reorder columns

Get Data Toolkit

🛒 Gumroad (€10): https://nexusatlas.gumroad.com/l/bsyacx
📦 ClawHub: https://clawhub.ai/skills/data-toolkit

MIT License — Python 3.8+, zero dependencies.

Usage

Convert Data

# JSON to CSV
./src/convert.py --input data.json --output data.csv --format csv

# CSV to JSON
./src/convert.py --input data.csv --output data.json --format json

# JSON to YAML
./src/convert.py --input data.json --output data.yaml --format yaml

# XML to JSON
./src/convert.py --input data.xml --output data.json --format json

# Batch conversion
./src/convert.py --input-dir ./raw --output-dir ./processed --format json

Validate Data

# Validate against JSON schema
./src/validate.py --input data.json --schema schema.json

# Validate CSV structure
./src/validate.py --input data.csv --check-headers --check-types

# Custom validation rules
./src/validate.py --input data.json --rules validation-rules.yaml

Clean Data

# Remove duplicates
./src/clean.py --input data.json --dedupe --key id

# Handle nulls
./src/clean.py --input data.csv --remove-nulls
./src/clean.py --input data.csv --replace-nulls "N/A"

# Normalize data
./src/clean.py --input data.json --normalize dates,numbers,strings

# Full cleanup pipeline
./src/clean.py --input messy.csv --dedupe --remove-nulls --normalize all --output clean.csv

API Usage (Python)

from data_toolkit import convert, validate, clean

# Convert
convert.json_to_csv('input.json', 'output.csv')
convert.csv_to_yaml('input.csv', 'output.yaml')

# Validate
is_valid = validate.json_schema('data.json', 'schema.json')
errors = validate.csv_structure('data.csv')

# Clean
clean.remove_duplicates('data.json', key='id')
clean.normalize_dates('data.csv', format='ISO8601')

Examples

See examples/ directory for complete workflows:

  • examples/etl-pipeline.sh - Full ETL workflow
  • examples/api-data-processing.py - API response processing
  • examples/batch-conversion.sh - Bulk file conversion

Installation

Dependencies are minimal and common:

  • Python 3.8+
  • PyYAML
  • pandas (optional, for advanced CSV operations)
pip install pyyaml pandas

Requirements

  • Node.js (for JSON/YAML parsing)
  • Python 3.8+
  • 10MB disk space

License

MIT

Support

Issues: https://github.com/forge-agent/data-toolkit Docs: See docs/ directory