Data Cleaning & Annotation Workflow
PassAudited by ClawScan on May 1, 2026.
Overview
This appears to be a transparent dataset cleaning and upload workflow, but it uses local helper scripts, Kaggle/platform accounts, and an external upload destination that users should handle deliberately.
Before installing, make sure you trust the annotation platform, are authorized to upload the dataset, install Kaggle/Python dependencies from trusted sources, and run the downloader in an empty dedicated directory.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If run in a directory with other ZIP files, the cleanup step could remove those ZIPs as part of the download workflow.
The user-directed downloader fetches, extracts, and deletes ZIP archives in the chosen output directory; this is expected for Kaggle dataset handling but can modify local files.
kaggle datasets download -d "$DATASET_NAME" -p "$OUTPUT_DIR" ... unzip -q *.zip ... rm *.zip
Run the downloader in a new, dedicated output directory and review extracted files before using or uploading them.
The workflow may act through your Kaggle or annotation-platform account when downloading or uploading datasets.
These steps can rely on Kaggle credentials and a data.smlcrm.com account/session, although the artifacts do not show hardcoded credentials or credential collection.
# Configure: kaggle competitions list ... Upload RAW dataset to data.smlcrm.com
Use accounts and API tokens with appropriate scope, and avoid running the workflow under accounts with unnecessary privileges.
Package installation depends on the user's local Python environment and package-source trust.
The setup uses a manual unpinned package install; this is common and purpose-aligned for Kaggle access, but it is not captured in an install spec.
# Install if needed: pip install kaggle
Install dependencies from trusted package indexes, preferably in a virtual environment, and review package versions if reproducibility matters.
Dataset contents and metadata may leave your local environment and be stored or processed by the annotation platform.
The workflow intentionally sends user-selected CSV data and metadata to an external platform; this is disclosed and central to the skill, but retention/privacy terms are not described in the artifacts.
Upload RAW dataset to data.smlcrm.com (with metadata)
Only upload public or approved datasets, and verify the platform's access controls and data-handling policy before uploading sensitive data.
