Install
openclaw skills install @jose-compu/self-improving-analyticsCaptures data quality issues, metric drift, pipeline failures, misleading visualizations, metric definition mismatches, and data freshness problems to enable continuous analytics improvement. Use when: (1) An ETL/ELT pipeline fails, (2) A metric value shows anomalous behavior, (3) Two teams define the same metric differently, (4) A dashboard shows wrong or misleading data, (5) A data freshness SLA is missed, (6) A schema change breaks downstream consumers.
openclaw skills install @jose-compu/self-improving-analyticsLog analytics-specific learnings, data issues, and feature requests to markdown files for continuous improvement. Captures data quality problems, metric drift, pipeline failures, misleading visualizations, metric definition mismatches, and data freshness breaches. Important learnings get promoted to data dictionaries, metric definitions, pipeline runbooks, dashboard standards, or data quality SLAs.
Before logging anything, ensure the .learnings/ directory and files exist in the project or workspace root. If any are missing, create them:
mkdir -p .learnings
[ -f .learnings/LEARNINGS.md ] || printf "# Analytics Learnings\n\nData quality patterns, metric drift insights, pipeline reliability findings, visualization best practices, and governance lessons.\n\n**Categories**: data_quality | metric_drift | pipeline_failure | visualization_mislead | definition_mismatch | freshness_issue\n**Areas**: ingestion | transformation | modeling | reporting | visualization | governance | data_catalog\n\n---\n" > .learnings/LEARNINGS.md
[ -f .learnings/DATA_ISSUES.md ] || printf "# Data Issues Log\n\nPipeline failures, data quality problems, metric anomalies, visualization errors, and schema drift.\n\n---\n" > .learnings/DATA_ISSUES.md
[ -f .learnings/FEATURE_REQUESTS.md ] || printf "# Feature Requests\n\nAnalytics tools, BI capabilities, data quality automation, and governance improvements.\n\n---\n" > .learnings/FEATURE_REQUESTS.md
Never overwrite existing files. This is a no-op if .learnings/ is already initialised.
Do not log connection strings, database credentials, API keys, or PII. Prefer short summaries or redacted excerpts over raw query results or full table dumps.
If you want automatic reminders, use the opt-in hook workflow described in Hook Integration.
| Situation | Action |
|---|---|
| ETL/ELT pipeline fails | Log to .learnings/DATA_ISSUES.md with pipeline name and error |
| Metric value anomaly (spike/drop) | Log to .learnings/DATA_ISSUES.md with statistical context |
| Two teams define metric differently | Log to .learnings/LEARNINGS.md with category definition_mismatch |
| Dashboard shows wrong or misleading data | Log to .learnings/LEARNINGS.md with category visualization_mislead |
| Data freshness SLA missed | Log to .learnings/DATA_ISSUES.md with SLA threshold and actual delay |
| Schema change breaks downstream | Log to .learnings/DATA_ISSUES.md with schema diff details |
| NULL rate spike in key column | Log to .learnings/DATA_ISSUES.md with column and threshold |
| Metric silently drifts (calculation change) | Log to .learnings/LEARNINGS.md with category metric_drift |
| Recurring data quality pattern | Link with **See Also**, consider priority bump |
| Broadly applicable pattern | Promote to data dictionary, pipeline runbook, or dashboard standard |
| Reusable data quality check | Promote to data quality SLA or dbt test |
OpenClaw is the primary platform for this skill. It uses workspace-based prompt injection with automatic skill loading.
Via ClawdHub (recommended):
clawdhub install self-improving-analytics
Manual:
git clone https://github.com/jose-compu/self-improving-analytics.git ~/.openclaw/skills/self-improving-analytics
OpenClaw injects these files into every session:
~/.openclaw/workspace/
├── AGENTS.md # Multi-agent workflows, delegation patterns
├── SOUL.md # Behavioral guidelines, personality, principles
├── TOOLS.md # Tool capabilities, integration gotchas
├── MEMORY.md # Long-term memory (main session only)
├── memory/ # Daily memory files
│ └── YYYY-MM-DD.md
└── .learnings/ # This skill's log files
├── LEARNINGS.md
├── DATA_ISSUES.md
└── FEATURE_REQUESTS.md
mkdir -p ~/.openclaw/workspace/.learnings
Then create the log files (or copy from assets/):
LEARNINGS.md — metric drift, definition mismatches, visualization issues, data quality patternsDATA_ISSUES.md — pipeline failures, freshness breaches, schema drift, metric anomaliesFEATURE_REQUESTS.md — analytics tools, BI capabilities, automation requestsWhen analytics learnings prove broadly applicable, promote them:
| Learning Type | Promote To | Example |
|---|---|---|
| Metric definitions | Data dictionary | "Active user = login within 7 days with feature interaction" |
| Pipeline failure patterns | Pipeline runbooks | "DST partition handling: always use UTC-based keys" |
| Visualization standards | Dashboard style guide | "Absolute value charts must start Y-axis at zero" |
| Data quality rules | Data quality SLAs | "NULL rate in PK columns must be <0.01%" |
| Governance patterns | AGENTS.md | "New metrics require data dictionary entry before dashboard" |
| Tool configuration | TOOLS.md | "dbt source freshness checks required on all external sources" |
For automatic reminders at session start:
cp -r hooks/openclaw ~/.openclaw/hooks/self-improving-analytics
openclaw hooks enable self-improving-analytics
See references/openclaw-integration.md for complete details.
For Claude Code, Codex, Copilot, or other agents, create .learnings/ in the project or workspace root:
mkdir -p .learnings
Create the files inline using the headers shown above.
Add to AGENTS.md, CLAUDE.md, or .github/copilot-instructions.md:
When data issues or analytics patterns are discovered:
.learnings/DATA_ISSUES.md, LEARNINGS.md, or FEATURE_REQUESTS.mdAppend to .learnings/LEARNINGS.md:
## [LRN-YYYYMMDD-XXX] category
**Logged**: ISO-8601 timestamp
**Priority**: low | medium | high | critical
**Status**: pending
**Area**: ingestion | transformation | modeling | reporting | visualization | governance | data_catalog
### Summary
One-line description of the analytics insight
### Details
Full context: what data pattern was found, why it is problematic,
what the correct approach is. Include root cause analysis.
### SQL Example
**Before (problematic):**
\`\`\`sql
-- problematic query, pipeline config, or metric definition
\`\`\`
**After (correct):**
\`\`\`sql
-- corrected query, config, or definition
\`\`\`
### Suggested Action
Specific data dictionary update, pipeline fix, dashboard change, or governance rule to adopt
### Metadata
- Source: etl_failure | freshness_breach | metric_anomaly | definition_conflict | dashboard_review | reconciliation_failure | schema_drift
- Pipeline: Airflow DAG name, dbt model, Fivetran connector (if applicable)
- Warehouse: snowflake | bigquery | redshift | postgres | databricks
- Related Tables: schema.table_name
- Tags: tag1, tag2
- See Also: LRN-20250110-001 (if related to existing entry)
- Pattern-Key: metric_drift.revenue_source | data_quality.null_spike (optional)
- Recurrence-Count: 1 (optional)
- First-Seen: 2025-01-15 (optional)
- Last-Seen: 2025-01-15 (optional)
---
Categories for learnings:
| Category | Use When |
|---|---|
data_quality | NULL spikes, duplicate records, invalid values, completeness issues |
metric_drift | Metric calculation silently changed due to new data source, schema change, or logic update |
pipeline_failure | ETL/ELT job failure, timeout, resource exhaustion, dependency issue |
visualization_mislead | Chart axis, scale, aggregation, or color choice that misrepresents data |
definition_mismatch | Same metric name with different definitions across teams or dashboards |
freshness_issue | Data arriving later than SLA, stale dashboards, partition delays |
Append to .learnings/DATA_ISSUES.md:
## [DAT-YYYYMMDD-XXX] issue_type_or_name
**Logged**: ISO-8601 timestamp
**Priority**: high
**Status**: pending
**Area**: ingestion | transformation | modeling | reporting | visualization | governance | data_catalog
### Summary
Brief description of the data issue
### Error Output
\`\`\`
Actual error message, pipeline log, query error, or anomaly description (redacted/summarized)
\`\`\`
### Root Cause
What in the pipeline, data model, or source system caused this issue.
Include the problematic query or configuration.
### Fix
\`\`\`sql
-- corrected query, pipeline config, or data quality check
\`\`\`
### Prevention
How to avoid this issue in the future (data quality test, pipeline alert, schema validation, SLA monitor)
### Context
- Trigger: etl_failure | freshness_breach | metric_anomaly | null_spike | schema_drift | rendering_error
- Pipeline: Airflow DAG name, dbt model, Fivetran connector
- Warehouse: snowflake | bigquery | redshift | postgres | databricks
- Affected Tables: schema.table_name
- Downstream Impact: dashboards, reports, or teams affected
### Metadata
- Reproducible: yes | no | unknown
- Related Tables: schema.table_name
- See Also: DAT-20250110-001 (if recurring)
---
Append to .learnings/FEATURE_REQUESTS.md:
## [FEAT-YYYYMMDD-XXX] capability_name
**Logged**: ISO-8601 timestamp
**Priority**: medium
**Status**: pending
**Area**: ingestion | transformation | modeling | reporting | visualization | governance | data_catalog
### Requested Capability
What analytics tool, automation, or capability is needed
### User Context
Why it's needed, what workflow it improves, what data problem it solves
### Complexity Estimate
simple | medium | complex
### Suggested Implementation
How this could be built: dbt macro, Airflow operator, data quality check, Looker feature, governance workflow
### Metadata
- Frequency: first_time | recurring
- Related Features: existing_tool_or_capability
---
Format: TYPE-YYYYMMDD-XXX
LRN (learning), DAT (data issue), FEAT (feature request)001, A7B)Examples: LRN-20250415-001, DAT-20250415-A3F, FEAT-20250415-002
When an issue is fixed, update the entry:
**Status**: pending → **Status**: resolved### Resolution
- **Resolved**: 2025-01-16T09:00:00Z
- **Commit/PR**: abc123 or #42
- **Notes**: Added data quality test / updated pipeline runbook / fixed metric definition
Other status values:
in_progress — Actively being investigated or fixedwont_fix — Decided not to address (add reason in Resolution notes)promoted — Elevated to data dictionary, pipeline runbook, or dashboard standardpromoted_to_skill — Extracted as a reusable skillAutomatically log when you encounter:
ETL/ELT Pipeline Failures (→ data issue with etl_failure trigger):
Data Freshness Breaches (→ data issue with freshness_breach trigger):
warn_after, error_after)Metric Value Anomalies (→ data issue with metric_anomaly trigger):
NULL Rate Spikes (→ data issue with null_spike trigger):
Schema Changes (→ data issue with schema_drift trigger):
Conflicting Definitions (→ learning with definition_mismatch category):
Visualization Issues (→ learning with visualization_mislead category):
| Priority | When to Use | Analytics Examples |
|---|---|---|
critical | Wrong data in executive dashboard or regulatory report | Revenue under-reported to board, compliance data incorrect, PII exposure in dashboard |
high | Pipeline down, metric definition conflict, SLA breach | Airflow DAG failed for >4h, Marketing vs Product metric mismatch, daily report stale |
medium | Data quality degradation, visualization improvement | NULL rate trending up, dashboard axis misleading, catalog entry outdated |
low | Catalog update, documentation, minor improvement | Column description missing, unused dashboard cleanup, tag standardization |
Use to filter learnings by analytics domain:
| Area | Scope |
|---|---|
ingestion | Data extraction, loading, CDC replication, API pulls, file imports |
transformation | SQL transforms, dbt models, Spark jobs, data cleaning, deduplication |
modeling | Dimensional modeling, entity relationships, slowly changing dimensions, grain |
reporting | Scheduled reports, email digests, PDF generation, data exports |
visualization | Dashboards, charts, Looker explores, Tableau workbooks, Metabase questions |
governance | Metric definitions, data ownership, access control, PII classification |
data_catalog | Column descriptions, table documentation, lineage, tagging, search |
When a learning is broadly applicable (not a one-off data fix), promote it to permanent standards.
| Target | What Belongs There |
|---|---|
| Data dictionary | Canonical metric definitions with owner, grain, and refresh cadence |
| Pipeline runbooks | Step-by-step recovery for known failure patterns |
| Dashboard standards | Visualization conventions (axis, colors, aggregation rules) |
| Data quality SLAs | Monitoring thresholds and alert configurations |
CLAUDE.md | Project-specific analytics conventions for AI agents |
AGENTS.md | Automated analytics workflows, data validation steps |
**Status**: pending → **Status**: promoted**Promoted**: data dictionary (or pipeline runbook, dashboard standard, data quality SLA)Learning → Data dictionary entry:
Marketing "active user" = 30-day login; Product = 7-day feature interaction → 420K vs 185K discrepancy.
Promoted as: active_users_30d (Marketing, login-based) and active_users_7d (Product, interaction-based) with governance note specifying which to use for board reports.
Learning → Pipeline runbook:
Pipeline fails every DST transition — partition key uses local time, hour 2 doesn't exist.
Promoted as: "DST Partition Recovery" runbook — rerun with UTC key, verify no duplicates, migrate all partitions to UTC.
If logging something similar to an existing entry:
grep -r "keyword" .learnings/**See Also**: DAT-20250110-001 in MetadataReview .learnings/ at natural breakpoints:
# Count pending analytics issues
grep -h "Status\*\*: pending" .learnings/*.md | wc -l
# List pending high-priority data issues
grep -B5 "Priority\*\*: high" .learnings/DATA_ISSUES.md | grep "^## \["
# Find learnings for a specific area
grep -l "Area\*\*: governance" .learnings/*.md
# Find all definition mismatches
grep -B2 "definition_mismatch" .learnings/LEARNINGS.md | grep "^## \["
Ingest recurring analytics patterns from simplify-and-harden into data quality rules or governance standards.
pattern_key as the dedupe key..learnings/LEARNINGS.md for existing entry: grep -n "Pattern-Key: <key>" .learnings/LEARNINGS.mdRecurrence-Count, update Last-Seen, add See Also links.LRN-... entry with Source: simplify-and-harden.Promotion threshold: Recurrence-Count >= 3, seen in 2+ pipelines/dashboards, within 30-day window.
Targets: data dictionary entries, pipeline runbooks, dashboard standards, CLAUDE.md / AGENTS.md.
Enable automatic reminders through agent hooks. This is opt-in.
Create .claude/settings.json in your project:
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "./skills/self-improving-analytics/scripts/activator.sh"
}]
}]
}
}
This injects an analytics-focused learning evaluation reminder after each prompt (~50-100 tokens overhead).
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "./skills/self-improving-analytics/scripts/activator.sh"
}]
}],
"PostToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "./skills/self-improving-analytics/scripts/error-detector.sh"
}]
}]
}
}
Enable PostToolUse only if you want the hook to inspect command output for pipeline errors, query failures, and data quality issues.
| Script | Hook Type | Purpose |
|---|---|---|
scripts/activator.sh | UserPromptSubmit | Reminds to evaluate analytics learnings after tasks |
scripts/error-detector.sh | PostToolUse (Bash) | Triggers on pipeline errors, query failures, data quality issues |
See references/hooks-setup.md for detailed configuration and troubleshooting.
When an analytics learning is valuable enough to become a reusable skill, extract it.
| Criterion | Description |
|---|---|
| Recurring | Same data issue in 2+ pipelines or warehouses |
| Verified | Status is resolved with working fix and data quality test |
| Non-obvious | Required actual investigation or cross-team coordination |
| Broadly applicable | Not project-specific; useful across data stacks |
| User-flagged | User says "save this as a skill" or similar |
./skills/self-improving-analytics/scripts/extract-skill.sh skill-name --dry-run
./skills/self-improving-analytics/scripts/extract-skill.sh skill-name
promoted_to_skill, add Skill-PathIn conversation: "This pipeline keeps failing the same way", "Save this data quality check as a skill", "Every warehouse has this DST issue", "This metric definition problem happens everywhere".
In entries: Multiple See Also links, high priority + resolved, definition_mismatch or pipeline_failure with broad applicability, same Pattern-Key across projects.
| Agent | Activation | Detection |
|---|---|---|
| Claude Code | Hooks (UserPromptSubmit, PostToolUse) | Automatic via error-detector.sh |
| Codex CLI | Hooks (same pattern) | Automatic via hook scripts |
| GitHub Copilot | Manual (.github/copilot-instructions.md) | Manual review |
| OpenClaw | Workspace injection + inter-agent messaging | Via session tools |
.learnings/ for past issues with the same tables or metricsKeep learnings local (per-analyst): add .learnings/ to .gitignore.
Track learnings in repo (team-wide): don't add to .gitignore — learnings become shared knowledge.
Hybrid (track templates, ignore entries): ignore .learnings/*.md, keep .learnings/.gitkeep.
This skill is standalone-compatible and stackable with other self-improving skills.
.learnings/analytics/.learnings/INDEX.mdEvery new entry must include:
**Skill**: analytics
event + matcher + file + 5m_window; max 1 reminder per skill every 5 minutes.Only trigger this skill automatically for analytics signals such as:
pipeline|etl|schema drift|metric mismatch|dashboardlineage|warehouse|bi|attribution|anomalyWhen guidance conflicts, apply:
securityengineeringcodingaimeta as tie-breaker.learnings/analytics/ in stackable mode.