Install
openclaw skills install data-cleaning-annotation-workflowComplete workflow for time series datasets (Energy, Manufacturing, Climate) on Kaggle to Data Annotation platform (data.smlcrm.com). Includes downloading, cleaning with pandas, uploading RAW with metadata, configuring columns (Time/Target/Covariate/Group), setting units (kWh, kVarh, tCO2, ratio, seconds), and assigning groups by selecting all variables and applying all group tags. Use when finding Kaggle datasets, cleaning for ML, uploading with metadata, configuring types/units, assigning groups to all variables, or complete pipeline to CLEAN status.
openclaw skills install data-cleaning-annotation-workflowComplete end-to-end workflow for time series dataset preparation and annotation on the Data Annotation platform (data.smlcrm.com).
This skill captures the precise workflow for processing time series datasets (Energy, Manufacturing, Climate) from discovery to CLEAN status:
For the full pipeline from Kaggle to annotated dataset:
1. Find dataset on Kaggle
2. Download (browser or kaggle CLI)
3. Clean with scripts/clean_dataset.py
4. Upload RAW dataset to data.smlcrm.com (with metadata)
5. Click "Clean" and upload cleaned file
6. Configure column metadata (types, units)
7. Assign groups to variables
8. Upload cleaned dataset → CLEAN status
From Kaggle (Browser Method):
Alternative: Kaggle CLI
# Install if needed: pip install kaggle
# Configure: kaggle competitions list
scripts/download_kaggle.sh <dataset-name> [output-dir]
# Example: scripts/download_kaggle.sh csafrit2/steel-industry-energy-consumption
Always run the cleaning script before upload:
python3 scripts/clean_dataset.py <input.csv> [-o <output.csv>]
What the script does:
Output:
Result: Dataset appears in list with RAW status
| Setting | Description |
|---|---|
| Name | Column name (editable) |
| Units | Measurement units (kWh, °C, %, ratio, tCO2, etc.) |
| Type | Time / Target / Covariate / Group |
Column Type Guide:
Bulk Configuration:
Common Unit Patterns:
Purpose: Group variables define how data is segmented for analysis.
Exact Workflow:
Select ALL variables by checking their checkboxes:
Apply ALL group tags to selected variables:
Result: All variables have all groups assigned (e.g., "WeekStatus × Day_of_week × Load_Type")
Important: Assign groups to BOTH target variables AND all covariates.
Source: https://www.kaggle.com/datasets/csafrit2/steel-industry-energy-consumption
Metadata:
Column Configuration:
| Column | Type | Units |
|---|---|---|
| Timestamps | Time | - |
| Usage_kWh | Target | kWh |
| Lagging_Current_Reactive.Power_kVarh | Covariate | kVarh |
| Leading_Current_Reactive_Power_kVarh | Covariate | kVarh |
| CO2(tCO2) | Covariate | tCO2 |
| Lagging_Current_Power_Factor | Covariate | ratio |
| Leading_Current_Power_Factor | Covariate | ratio |
| NSM | Covariate | seconds |
| WeekStatus | Group | - |
| Day_of_week | Group | - |
| Load_Type | Group | - |
Group Assignment:
For detailed platform configuration guidance, see references/platform_guide.md.
"Next" button disabled:
Groups not appearing:
Upload fails:
| Script | Purpose |
|---|---|
scripts/clean_dataset.py | Clean and prepare CSV for upload |
scripts/download_kaggle.sh | Download datasets via Kaggle CLI |
Data Annotation Platform: https://data.smlcrm.com