Skylv Data Pipeline Builder

Build ETL/data pipelines with natural language. Extract from databases/APIs, transform with code, load to destinations. No pipeline framework expertise needed.

duplicate of @sky-lv/data-pipeline-builder

Audits

Warn

ClawScanWarn

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw skills install skylv-data-pipeline-builder

data-pipeline-builder

Build data pipelines without framework expertise. Extract from any source, transform with code, load to any destination — all with natural language commands.

What It Does

Extract data — From databases, APIs, files, S3, GCS, Kafka
Transform — Filters, mappings, aggregations, joins, custom code
Load — To databases, data warehouses, files, APIs
Schedule — Cron-based or event-triggered execution
Monitor — Pipeline status, throughput, error rates
Validate — Schema checks, data quality rules

Quick Start

# 1. Create a simple pipeline
create pipeline from mysql users to postgres users_backup

# 2. Add transformation
add transform to users-backup: filter where active = true

# 3. Schedule it
schedule users-backup daily at 2:00 AM

# 4. Run and monitor
run pipeline users-backup
check pipeline status

Common Use Cases

🔄 Database Synchronization

# Sync production to analytics warehouse
create pipeline from mysql production.orders \
  to bigquery analytics.orders

# Run incremental sync every hour
schedule orders-sync hourly

📊 API Data Extraction

# Pull data from REST API
create pipeline from api https://api.shop.com/orders \
  to postgres analytics.orders

# Add authentication
set source auth: bearer token xxx

🧹 Data Cleaning

# Clean and transform data
create pipeline from csv raw_data.csv to postgres clean_data

add transform: \
  remove duplicates on email \
  fill nulls in age with 0 \
  validate email format

📈 Analytics Preparation

# Aggregate for dashboards
create pipeline from postgres transactions \
  to postgres daily_summary

add transform: \
  group by date, product \
  aggregate sum(revenue), count(*) \
  where date >= yesterday

All Commands

Command	Purpose
`create pipeline from <src> to <dst>`	Define new pipeline
`add transform <pipeline>`	Add transformation step
`schedule <pipeline> <when>`	Set run schedule
`run pipeline <name>`	Execute immediately
`check pipeline status`	View running pipelines
`pause pipeline <name>`	Stop scheduled runs
`view logs <pipeline>`	See execution history
`validate <pipeline>`	Test without executing

Supported Sources & Destinations

Databases: MySQL, PostgreSQL, MongoDB, Redis, SQLite

Cloud Storage: S3, GCS, Azure Blob

Data Warehouses: BigQuery, Snowflake, Redshift

Streaming: Kafka, Kinesis, Pub/Sub

Files: CSV, JSON, Parquet, Excel

Requirements

Node.js 18+ or Python 3.8+
Source/destination connectors (auto-installed)
Optional: Airflow, Dagster for orchestration