Data Cleaning & Annotation Workflow

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a transparent dataset cleaning and upload workflow, but it uses local helper scripts, Kaggle/platform accounts, and an external upload destination that users should handle deliberately.

Before installing, make sure you trust the annotation platform, are authorized to upload the dataset, install Kaggle/Python dependencies from trusted sources, and run the downloader in an empty dedicated directory.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If run in a directory with other ZIP files, the cleanup step could remove those ZIPs as part of the download workflow.

Why it was flagged

The user-directed downloader fetches, extracts, and deletes ZIP archives in the chosen output directory; this is expected for Kaggle dataset handling but can modify local files.

Skill content
kaggle datasets download -d "$DATASET_NAME" -p "$OUTPUT_DIR" ... unzip -q *.zip ... rm *.zip
Recommendation

Run the downloader in a new, dedicated output directory and review extracted files before using or uploading them.

What this means

The workflow may act through your Kaggle or annotation-platform account when downloading or uploading datasets.

Why it was flagged

These steps can rely on Kaggle credentials and a data.smlcrm.com account/session, although the artifacts do not show hardcoded credentials or credential collection.

Skill content
# Configure: kaggle competitions list ... Upload RAW dataset to data.smlcrm.com
Recommendation

Use accounts and API tokens with appropriate scope, and avoid running the workflow under accounts with unnecessary privileges.

What this means

Package installation depends on the user's local Python environment and package-source trust.

Why it was flagged

The setup uses a manual unpinned package install; this is common and purpose-aligned for Kaggle access, but it is not captured in an install spec.

Skill content
# Install if needed: pip install kaggle
Recommendation

Install dependencies from trusted package indexes, preferably in a virtual environment, and review package versions if reproducibility matters.

What this means

Dataset contents and metadata may leave your local environment and be stored or processed by the annotation platform.

Why it was flagged

The workflow intentionally sends user-selected CSV data and metadata to an external platform; this is disclosed and central to the skill, but retention/privacy terms are not described in the artifacts.

Skill content
Upload RAW dataset to data.smlcrm.com (with metadata)
Recommendation

Only upload public or approved datasets, and verify the platform's access controls and data-handling policy before uploading sensitive data.