Data Cleaning & Annotation Workflow
Analysis
This appears to be a transparent dataset cleaning and upload workflow, but it uses local helper scripts, Kaggle/platform accounts, and an external upload destination that users should handle deliberately.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.
kaggle datasets download -d "$DATASET_NAME" -p "$OUTPUT_DIR" ... unzip -q *.zip ... rm *.zip
The user-directed downloader fetches, extracts, and deletes ZIP archives in the chosen output directory; this is expected for Kaggle dataset handling but can modify local files.
# Install if needed: pip install kaggle
The setup uses a manual unpinned package install; this is common and purpose-aligned for Kaggle access, but it is not captured in an install spec.
Checks whether tool use, credentials, dependencies, identity, account access, or inter-agent boundaries are broader than the stated purpose.
# Configure: kaggle competitions list ... Upload RAW dataset to data.smlcrm.com
These steps can rely on Kaggle credentials and a data.smlcrm.com account/session, although the artifacts do not show hardcoded credentials or credential collection.
Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.
Upload RAW dataset to data.smlcrm.com (with metadata)
The workflow intentionally sends user-selected CSV data and metadata to an external platform; this is disclosed and central to the skill, but retention/privacy terms are not described in the artifacts.
