Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Finetune

Manage model fine-tuning datasets using CLI tools. Use when preparing, validating, or transforming training data for LLM fine-tuning.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 23 · 0 current installs · 0 all-time installs
byBytesAgain2@ckchzh
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match the provided scripts and commands. The tool operates on local JSONL dataset files under ~/.finetune and offers prepare/validate/format/split/augment/stats/export functionality — all coherent with dataset management for fine-tuning.
Instruction Scope
SKILL.md instructs the agent to run scripts/script.sh which creates and writes files in ~/.finetune (data.jsonl, train.jsonl, val.jsonl, config.json) and can export to user-specified output paths. This is expected, but the script can write arbitrary files (export path) and modifies files under the user's home directory — users should be aware of local file writes and potential overwrites.
Install Mechanism
No install spec is present (instruction-only plus embedded script). No remote downloads or package installs are performed by the skill itself, which minimizes install-time risk.
!
Credentials
The skill advertises an 'upload' target that includes external services (openai, huggingface) but declares no required environment variables or primary credential. If the script implements actual uploads, API keys or tokens would normally be required — the absence of declared credentials is an incoherence. Also, export allows arbitrary output paths (possible overwrite of filesystem locations) without path safety checks.
Persistence & Privilege
always:false and the skill does not request elevated privileges. It persists data only under ~/.finetune (creates and writes its own files), which is expected for a data manager.
Scan Findings in Context
[NO_FINDINGS] expected: Static pre-scan found no flagged patterns. This is consistent with an instruction-only skill and simple Python snippets that perform local I/O. However, absence of findings does not confirm that networking/upload is safe; the script was truncated in the supplied content so upload behavior couldn't be fully verified.
What to consider before installing
This skill appears to do what it says for local dataset management (creating, validating, formatting, splitting, augmenting, exporting files under ~/.finetune). Before installing or running it: 1) Inspect the full scripts/script.sh upload implementation to confirm whether it performs any network calls and, if so, which endpoints and how it expects API keys to be supplied. 2) Expect the tool to create and modify files under ~/.finetune; back up any important files and do not run it as root. 3) Be cautious when using export --output with paths you do not control (it can overwrite files). 4) If you plan to upload to OpenAI/Hugging Face, ensure API credentials are provided securely (and verify the script does not hard-code or exfiltrate them). If you want, provide the remainder of scripts/script.sh (the truncated portion) and I can re-evaluate the upload implementation and raise the confidence level.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk979amfzz2rghkhw4n711kc5zn835q7f

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Finetune — Model Fine-tuning Data Manager

A thorough CLI tool for managing fine-tuning datasets. Handles data preparation, validation, formatting, splitting, augmentation, and export for LLM training workflows.

Prerequisites

  • Python 3.8+
  • bash shell
  • Write access to ~/.finetune/

Data Storage

All dataset records are stored in JSONL format at ~/.finetune/data.jsonl. Each record contains metadata about a dataset entry including system prompt, user message, assistant response, and associated tags.

Configuration is stored at ~/.finetune/config.json.

Commands

Run commands via: bash scripts/script.sh <command> [arguments...]

prepare

Prepare a new fine-tuning data entry by providing system prompt, user message, and assistant response.

bash scripts/script.sh prepare --system "You are helpful" --user "Hello" --assistant "Hi there!"

Arguments:

  • --system — System prompt text (required)
  • --user — User message text (required)
  • --assistant — Assistant response text (required)
  • --tags — Comma-separated tags (optional)

validate

Validate the dataset for common issues like missing fields, empty responses, duplicate entries, and format errors.

bash scripts/script.sh validate
bash scripts/script.sh validate --strict

Arguments:

  • --strict — Enable strict validation mode (optional)

format

Convert the dataset to a specific output format: OpenAI chat, Alpaca, ShareGPT, or raw JSONL.

bash scripts/script.sh format --type openai
bash scripts/script.sh format --type alpaca

Arguments:

  • --type — Output format: openai, alpaca, sharegpt, raw (required)

split

Split the dataset into training and validation sets with a configurable ratio.

bash scripts/script.sh split --ratio 0.8
bash scripts/script.sh split --ratio 0.9 --seed 42

Arguments:

  • --ratio — Train/total ratio, e.g. 0.8 means 80% train (required)
  • --seed — Random seed for reproducibility (optional)

augment

Augment existing data entries by generating paraphrased or varied versions.

bash scripts/script.sh augment --id <entry_id>
bash scripts/script.sh augment --id <entry_id> --method synonym

Arguments:

  • --id — Entry ID to augment (required)
  • --method — Augmentation method: synonym, rephrase, expand (optional, default: synonym)

stats

Display dataset statistics including total entries, average lengths, tag distribution, and quality metrics.

bash scripts/script.sh stats
bash scripts/script.sh stats --detailed

Arguments:

  • --detailed — Show detailed per-field statistics (optional)

preview

Preview dataset entries, optionally filtered by tag or ID.

bash scripts/script.sh preview
bash scripts/script.sh preview --id <entry_id>
bash scripts/script.sh preview --tag coding --limit 5

Arguments:

  • --id — Preview a specific entry (optional)
  • --tag — Filter by tag (optional)
  • --limit — Max entries to show (optional, default: 10)

export

Export the dataset to a file in the specified format.

bash scripts/script.sh export --output dataset.jsonl
bash scripts/script.sh export --output dataset.json --format openai

Arguments:

  • --output — Output file path (required)
  • --format — Export format: jsonl, openai, alpaca, csv (optional, default: jsonl)

config

View or update tool configuration such as default format, validation strictness, and export paths.

bash scripts/script.sh config
bash scripts/script.sh config --set default_format openai
bash scripts/script.sh config --set strict_validation true

Arguments:

  • --set — Key-value pair to set (optional)

upload

Simulate uploading the dataset to a fine-tuning endpoint. Validates before upload.

bash scripts/script.sh upload --target openai
bash scripts/script.sh upload --target local --path /models/data/

Arguments:

  • --target — Upload target: openai, local, huggingface (required)
  • --path — Local path for local target (optional)

help

Display help information and list all available commands.

bash scripts/script.sh help

version

Display the current tool version.

bash scripts/script.sh version

Examples

# Prepare a coding instruction pair
bash scripts/script.sh prepare --system "You are a Python expert" \
  --user "How do I reverse a list?" \
  --assistant "Use list[::-1] or list.reverse()" \
  --tags "python,basics"

# Validate the full dataset
bash scripts/script.sh validate --strict

# Check stats
bash scripts/script.sh stats --detailed

# Export in OpenAI format
bash scripts/script.sh export --output training.jsonl --format openai

# Split into train/val
bash scripts/script.sh split --ratio 0.8 --seed 42

Notes

  • All data is stored locally in ~/.finetune/data.jsonl
  • Use validate before upload or export to catch issues early
  • The split command creates ~/.finetune/train.jsonl and ~/.finetune/val.jsonl
  • Tags help organize and filter entries for domain-specific fine-tuning

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…