Install
openclaw skills install file-splitterSplit large files into smaller chunks with semantic boundary detection. Supports JSON, Markdown, and TXT formats. Preserves data integrity by splitting at natural boundaries (JSON array elements, MD headings, TXT paragraphs). Use when: user needs to split large files, chunk datasets, segment corpora, or break down files into manageable pieces for processing or analysis. Triggers: split file, chunk, segment, file splitter, JSON split, MD split, TXT split, corpus segmentation, data chunking.
openclaw skills install file-splitterSplit large files into smaller, manageable chunks while preserving semantic structure.
python <skill_dir>/scripts/split_files.py --input <input_folder> --output <output_folder> [options]
| Parameter | Required | Default | Description |
|---|---|---|---|
--input | Yes | - | Source folder containing files to split |
--output | Yes | - | Output folder for split chunks |
--max-size | No | 512000 (500KB) | Maximum bytes per chunk |
--min-size | No | 409600 (400KB) | Minimum bytes per chunk |
--seq-digits | No | 9 | Number of digits in sequence numbers |
--formats | No | json,md,txt | File formats to process (comma-separated) |
--dry-run | No | false | Preview mode - show what would be split without executing |
# Default 500KB split
python split_files.py --input "./corpus" --output "./corpus/chunks"
# Custom 200KB chunks
python split_files.py --input "./notes" --output "./notes/chunks" --max-size 204800 --min-size 153600
# JSON files only
python split_files.py --input "./data" --output "./data/out" --formats json
# Preview mode
python split_files.py --input "./data" --output "./data/out" --dry-run
[...]# through ######)Format: {source_filename_without_extension}{9-digit_sequence_number}{extension}
Examples:
dataset000000001.jsondataset000000002.jsonnotes000000001.md