Etl Design

v1.0.0

Deep ETL/ELT design workflow—extract patterns, transforms, loading strategies, idempotency, validation, and reconciliation. Use when designing batch data flo...

0· 133·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name and description match the SKILL.md content: a six-stage ETL/ELT design workflow. Nothing in the skill requests unrelated resources or capabilities.
Instruction Scope
SKILL.md provides high-level design guidance (source contract, extract, transforms, load/dedupe, validation, ops/backfill). It does not instruct the agent to read system files, access environment variables, or transmit data to external endpoints.
Install Mechanism
No install spec and no code files; this is instruction-only so nothing is written to disk or downloaded during install.
Credentials
No required environment variables, credentials, or config paths are declared; the guidance is purely conceptual and does not demand secrets or external service access.
Persistence & Privilege
Skill is not always-enabled and uses default invocation behavior; it does not request persistent or elevated platform privileges.
Assessment
This skill is high-level design advice and appears safe to install. It won't by itself access your data or systems because it has no installs, code, or credential requirements. Before using it in an agent that also has connectors or other skills, confirm those other skills do not grant the agent access to production data sources or secrets — the ETL guidance may prompt actions that require those connectors, and you should control credential access separately.

Like a lobster shell, security has layers — review code before you run it.

latestvk977y6myyhg07k502f0n9y6g6d83k3gt
133downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

ETL Design

ETL is correctness under change: schema drift, partial loads, retries, and reconciliation with upstream systems.

When to Offer This Workflow

Trigger conditions:

  • Batch loads into warehouse or data lake
  • Choosing between CDC, snapshots, and incremental watermarks
  • Missing rows, duplicates, or inconsistent aggregates downstream

Initial offer:

Use six stages: (1) source contract, (2) extract strategy, (3) transform rules, (4) load & dedupe, (5) validation, (6) operations & backfill). Confirm batch window and SLA.


Stage 1: Source Contract

Goal: Document schema, primary keys, change indicators (updated_at, CDC log position), and access constraints (rate limits, read replicas).


Stage 2: Extract Strategy

Goal: Full dump vs incremental watermark vs CDC—trade freshness, source load, and complexity.

Practices

  • CDC for large sources; snapshots for small or infrequent tables

Stage 3: Transform Rules

Goal: Deterministic transforms; surrogate keys; business rules versioned; handling of deletes (tombstones vs hard deletes).


Stage 4: Load & Dedupe

Goal: Upsert keys; partitions; rerunnable jobs with same batch id producing the same outcome (idempotent load).


Stage 5: Validation

Goal: Row counts, checksums, key uniqueness, referential checks; alert on threshold breaches.


Stage 6: Operations & Backfill

Goal: Replay by date range; monitor lag; dead-letter or quarantine bad rows with reason codes.


Final Review Checklist

  • Source contract and keys documented
  • Extract mode matches SLA and source constraints
  • Transforms deterministic and versioned
  • Idempotent load strategy
  • Validation and reconciliation defined

Tips for Effective Guidance

  • Plan for late-arriving facts and slowly changing dimensions in analytics paths.
  • Pair with data-pipelines for orchestration and monitoring.

Handling Deviations

  • Near-real-time: document micro-batch or streaming semantics separately.

Comments

Loading comments...