Install
openclaw skills install zcx-data-analyzerLoad structured CSV, Excel, or JSON data to compute stats, detect anomalies, analyze trends and correlations, and generate summary reports with chart suggest...
openclaw skills install zcx-data-analyzerLoad, analyze, and report on structured data from CSV, Excel, and JSON files. Compute statistics, detect anomalies, identify trends, and generate reports with visualization recommendations.
1. Load data → Read the file, inspect structure
2. Profile → Column types, missing values, basic stats
3. Analyze → Statistics, trends, anomalies, correlations
4. Report → Summary with visual recommendations
| Format | How to Read | Notes |
|---|---|---|
| CSV | Read the file directly, parse header row + data rows | Check delimiter (comma, tab, semicolon). Handle quoted fields. |
| Excel (.xlsx) | Read via openpyxl or pandas. If unavailable, convert to CSV first. | Handle multiple sheets. Note which sheet was used. |
| JSON | Parse as structured objects. Detect if array-of-objects or object-of-arrays. | Flatten nested structures where possible. |
| TSV | Same as CSV with tab delimiter. |
If Python is available (recommended for large datasets):
pip install pandas openpyxl # if missing
python3 -c "
import pandas as pd
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())
print(df.head())
"
If Python is not available, parse manually:
After loading, always answer these questions:
Compute and report:
| Statistic | What It Tells You |
|---|---|
| Count | Number of non-null values |
| Mean | Average value |
| Median | Midpoint (50th percentile) — more robust than mean for skewed data |
| Std Dev | Spread around the mean |
| Min / Max | Full range |
| 25th / 75th Percentile | Interquartile range bounds |
| Skewness | Symmetry of the distribution. Positive = right tail, negative = left tail. |
Formula reference (manual calculation):
Mean = sum(x) / n
Median = middle value when sorted
Std Dev = sqrt(sum((x - mean)^2) / (n-1))
Percentile = sort values, take value at position (p/100 * n)
| Statistic | What It Tells You |
|---|---|
| Count | Total non-null values |
| Unique | Number of distinct categories |
| Top | Most frequent category |
| Frequency | How often the top category appears |
| Distribution | Share of each category (as percentages) |
Method: IQR (Interquartile Range)
Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1
Lower fence = Q1 - 1.5 * IQR
Upper fence = Q3 + 1.5 * IQR
Anomaly = any value outside [Lower fence, Upper fence]
Method: Z-Score (for approximately normal distributions)
z = (x - mean) / std_dev
Anomaly = |z| > 3 (values more than 3 std devs from mean)
Output anomalies: For each detected anomaly, report:
For time-series data (data with a date/time column):
Output format:
📈 Trend: [Metric Name]
Period: [Date Range]
Direction: [Up/Down/Flat] (slope: ±X%)
Key Points:
- [Date]: Value = X (↗/↘/→)
- Highest point: [Date] = X
- Lowest point: [Date] = X
For non-time-series data, analyze rank order and distribution shape:
Top 5 by [metric]:
1. [Category] = X
2. [Category] = Y
...
Bottom 5 by [metric]:
Pearson correlation coefficient (for linear relationships between two numeric variables):
r = sum((x - mean_x) * (y - mean_y)) / (n * std_x * std_y)
Interpretation:
| r value | Strength | Direction |
|---|---|---|
| 0.7 to 1.0 | Strong | Positive (both rise together) |
| 0.3 to 0.7 | Moderate | Positive |
| 0 to 0.3 | Weak | Positive |
| -0.3 to 0 | Weak | Negative (one rises, other falls) |
| -0.7 to -0.3 | Moderate | Negative |
| -1.0 to -0.7 | Strong | Negative |
Caveats:
For each finding, recommend the best chart type:
| Analysis Type | Recommended Chart | Why |
|---|---|---|
| Distribution of one variable | Histogram | Shows shape, skew, peaks |
| Comparison across categories | Bar chart | Easy to compare magnitudes |
| Trend over time | Line chart | Emphasizes direction and continuity |
| Relationship between 2 variables | Scatter plot | Shows correlation, clusters, outliers |
| Part of a whole | Pie / Donut chart | Use only for 2-5 categories |
| Composition over time | Stacked area chart | Shows both total and parts |
| Rank order | Horizontal bar chart | Easy to read sorted values |
| Comparing multiple distributions | Box plot | Shows median, IQR, outliers |
| Heatmap (correlation matrix) | Heatmap | Quick visual of many correlations |
# Data Analysis Report: [Dataset Name]
Date: [YYYY-MM-DD]
## 1. Overview
- Rows: X | Columns: Y
- Missing data: X cells (X%)
- Key columns: [list with types]
## 2. Descriptive Statistics
### Numeric Columns
[Table: col_name, count, mean, median, std, min, 25%, 75%, max]
### Categorical Columns
[Table: col_name, unique_count, top_value, frequency%]
## 3. Key Findings
### Finding 1: [Title]
[Description of finding]
📊 Recommended chart: [Chart type]
Supporting data: [stats/view]
### Finding 2: [Title]
...
## 4. Anomalies Detected
[Table: row, column, value, severity]
## 5. Correlations
[Notable correlations >|0.3| or < -|0.3|]
## 6. Recommendations
[Data-driven suggestions based on analysis]
For quick results, use this compact format:
📊 [Dataset]: [N] rows × [M] cols
📈 Key metrics:
- [metric1]: mean=X, median=Y, range=[min, max]
- [metric2]: ...
🔍 Top findings:
1. [Finding] — [chart recommendation]
2. [Finding] — [chart recommendation]
⚠️ Anomalies: X detected
For complex analysis, create and run a Python script:
import csv, json, statistics
from collections import Counter
# Load data
with open('data.csv') as f:
reader = csv.DictReader(f)
rows = list(reader)
# Get numeric columns
# (column name → list of float values, filtering out blanks)
# Compute mean, median, stdev, percentiles
# Detect outliers via IQR
# Compute correlations between pairs
# Print formatted results
Run with:
python3 analysis.py