Finetune
Manage model fine-tuning datasets using CLI tools. Use when preparing, validating, or transforming training data for LLM fine-tuning.
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
Finetune — Model Fine-tuning Data Manager
A thorough CLI tool for managing fine-tuning datasets. Handles data preparation, validation, formatting, splitting, augmentation, and export for LLM training workflows.
Prerequisites
- Python 3.8+
bashshell- Write access to
~/.finetune/
Data Storage
All dataset records are stored in JSONL format at ~/.finetune/data.jsonl. Each record contains metadata about a dataset entry including system prompt, user message, assistant response, and associated tags.
Configuration is stored at ~/.finetune/config.json.
Commands
Run commands via: bash scripts/script.sh <command> [arguments...]
prepare
Prepare a new fine-tuning data entry by providing system prompt, user message, and assistant response.
bash scripts/script.sh prepare --system "You are helpful" --user "Hello" --assistant "Hi there!"
Arguments:
--system— System prompt text (required)--user— User message text (required)--assistant— Assistant response text (required)--tags— Comma-separated tags (optional)
validate
Validate the dataset for common issues like missing fields, empty responses, duplicate entries, and format errors.
bash scripts/script.sh validate
bash scripts/script.sh validate --strict
Arguments:
--strict— Enable strict validation mode (optional)
format
Convert the dataset to a specific output format: OpenAI chat, Alpaca, ShareGPT, or raw JSONL.
bash scripts/script.sh format --type openai
bash scripts/script.sh format --type alpaca
Arguments:
--type— Output format:openai,alpaca,sharegpt,raw(required)
split
Split the dataset into training and validation sets with a configurable ratio.
bash scripts/script.sh split --ratio 0.8
bash scripts/script.sh split --ratio 0.9 --seed 42
Arguments:
--ratio— Train/total ratio, e.g. 0.8 means 80% train (required)--seed— Random seed for reproducibility (optional)
augment
Augment existing data entries by generating paraphrased or varied versions.
bash scripts/script.sh augment --id <entry_id>
bash scripts/script.sh augment --id <entry_id> --method synonym
Arguments:
--id— Entry ID to augment (required)--method— Augmentation method:synonym,rephrase,expand(optional, default:synonym)
stats
Display dataset statistics including total entries, average lengths, tag distribution, and quality metrics.
bash scripts/script.sh stats
bash scripts/script.sh stats --detailed
Arguments:
--detailed— Show detailed per-field statistics (optional)
preview
Preview dataset entries, optionally filtered by tag or ID.
bash scripts/script.sh preview
bash scripts/script.sh preview --id <entry_id>
bash scripts/script.sh preview --tag coding --limit 5
Arguments:
--id— Preview a specific entry (optional)--tag— Filter by tag (optional)--limit— Max entries to show (optional, default: 10)
export
Export the dataset to a file in the specified format.
bash scripts/script.sh export --output dataset.jsonl
bash scripts/script.sh export --output dataset.json --format openai
Arguments:
--output— Output file path (required)--format— Export format:jsonl,openai,alpaca,csv(optional, default:jsonl)
config
View or update tool configuration such as default format, validation strictness, and export paths.
bash scripts/script.sh config
bash scripts/script.sh config --set default_format openai
bash scripts/script.sh config --set strict_validation true
Arguments:
--set— Key-value pair to set (optional)
upload
Simulate uploading the dataset to a fine-tuning endpoint. Validates before upload.
bash scripts/script.sh upload --target openai
bash scripts/script.sh upload --target local --path /models/data/
Arguments:
--target— Upload target:openai,local,huggingface(required)--path— Local path forlocaltarget (optional)
help
Display help information and list all available commands.
bash scripts/script.sh help
version
Display the current tool version.
bash scripts/script.sh version
Examples
# Prepare a coding instruction pair
bash scripts/script.sh prepare --system "You are a Python expert" \
--user "How do I reverse a list?" \
--assistant "Use list[::-1] or list.reverse()" \
--tags "python,basics"
# Validate the full dataset
bash scripts/script.sh validate --strict
# Check stats
bash scripts/script.sh stats --detailed
# Export in OpenAI format
bash scripts/script.sh export --output training.jsonl --format openai
# Split into train/val
bash scripts/script.sh split --ratio 0.8 --seed 42
Notes
- All data is stored locally in
~/.finetune/data.jsonl - Use
validatebeforeuploadorexportto catch issues early - The
splitcommand creates~/.finetune/train.jsonland~/.finetune/val.jsonl - Tags help organize and filter entries for domain-specific fine-tuning
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
Files
2 totalComments
Loading comments…
