Back to skill
v1.0.0

Data Cleaning & Annotation Workflow

BenignClawScan verdict for this skill. Analyzed May 1, 2026, 5:50 AM.

Analysis

This appears to be a transparent dataset cleaning and upload workflow, but it uses local helper scripts, Kaggle/platform accounts, and an external upload destination that users should handle deliberately.

GuidanceBefore installing, make sure you trust the annotation platform, are authorized to upload the dataset, install Kaggle/Python dependencies from trusted sources, and run the downloader in an empty dedicated directory.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Abnormal behavior control

Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.

Tool Misuse and Exploitation
SeverityLowConfidenceHighStatusNote
scripts/download_kaggle.sh
kaggle datasets download -d "$DATASET_NAME" -p "$OUTPUT_DIR" ... unzip -q *.zip ... rm *.zip

The user-directed downloader fetches, extracts, and deletes ZIP archives in the chosen output directory; this is expected for Kaggle dataset handling but can modify local files.

User impactIf run in a directory with other ZIP files, the cleanup step could remove those ZIPs as part of the download workflow.
RecommendationRun the downloader in a new, dedicated output directory and review extracted files before using or uploading them.
Agentic Supply Chain Vulnerabilities
SeverityInfoConfidenceHighStatusNote
SKILL.md
# Install if needed: pip install kaggle

The setup uses a manual unpinned package install; this is common and purpose-aligned for Kaggle access, but it is not captured in an install spec.

User impactPackage installation depends on the user's local Python environment and package-source trust.
RecommendationInstall dependencies from trusted package indexes, preferably in a virtual environment, and review package versions if reproducibility matters.
Permission boundary

Checks whether tool use, credentials, dependencies, identity, account access, or inter-agent boundaries are broader than the stated purpose.

Identity and Privilege Abuse
SeverityLowConfidenceMediumStatusNote
SKILL.md
# Configure: kaggle competitions list ... Upload RAW dataset to data.smlcrm.com

These steps can rely on Kaggle credentials and a data.smlcrm.com account/session, although the artifacts do not show hardcoded credentials or credential collection.

User impactThe workflow may act through your Kaggle or annotation-platform account when downloading or uploading datasets.
RecommendationUse accounts and API tokens with appropriate scope, and avoid running the workflow under accounts with unnecessary privileges.
Sensitive data protection

Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.

Insecure Inter-Agent Communication
SeverityLowConfidenceHighStatusNote
SKILL.md
Upload RAW dataset to data.smlcrm.com (with metadata)

The workflow intentionally sends user-selected CSV data and metadata to an external platform; this is disclosed and central to the skill, but retention/privacy terms are not described in the artifacts.

User impactDataset contents and metadata may leave your local environment and be stored or processed by the annotation platform.
RecommendationOnly upload public or approved datasets, and verify the platform's access controls and data-handling policy before uploading sensitive data.