Dataset Splitter

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This appears to be a straightforward local dataset-splitting tool, but it moves files by default and users should review the annotation and install behavior before running it.

Before installing or running it, confirm you are pointing it at the intended dataset folder. Use --copy if you want a non-destructive split, use --yolo if you expect labels to be split by the script, and install Pillow in an isolated environment if you use the stats feature.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Running the default split can reorganize or remove images from the original folder, which may surprise users expecting a non-destructive copy.

Why it was flagged

The default behavior without --copy moves user-selected image files into split directories, which is purpose-aligned but mutates the source dataset.

Skill content
if args.copy:
                shutil.copy2(img_path, dest_path)
            else:
                shutil.move(img_path, dest_path)
Recommendation

Use --copy or back up the dataset when you want to preserve the original directory unchanged.

What this means

If users run the documented non-YOLO annotation example, images may be moved while labels remain in the original annotation folder.

Why it was flagged

Annotations are processed only when a destination annotation directory exists, and those directories are only created for --yolo output, while SKILL.md presents --annotations as generally splitting annotations together.

Skill content
train_ann = os.path.join(output_dir, "train", "labels") if args.yolo and ann_dir else None
...
if args.annotations and src_ann_dir and dest_ann_dir:
Recommendation

Use --yolo when you need labels split by this script, or verify the output carefully before deleting or relying on the original dataset layout.

What this means

Installing an unpinned package can produce different versions over time or inherit normal package-supply-chain risk.

Why it was flagged

The skill asks users to install an external Python package without a pinned version; this is expected for image statistics but is not captured by an install spec.

Skill content
pip install pillow
Recommendation

Install Pillow in a virtual environment and consider pinning a trusted version if reproducibility matters.