yolo-vision-tools

Use Ultralytics YOLO to perform computer vision tasks, such as detecting people or objects in images and videos, classifying images, estimating human poses, and tracking cars, people, or animals in videos.

Ruoyu@ruoyu05

Install

openclaw skills install @ruoyu05/yolo-vision-tools

Ultralytics YOLO Vision Tools

Ultralytics YOLO is a state-of-the-art computer vision framework supporting multiple tasks including object detection, instance segmentation, image classification, pose estimation, and oriented bounding box detection. This skill provides comprehensive guidance for using YOLO effectively.

Latest Model: YOLO26 (released January 2026) features end-to-end NMS-free inference and optimized edge deployment. For stable production workloads, both YOLO26 and YOLO11 are recommended.

Quick Start

1. Installation & Environment Check

bash

# Install/update Ultralytics
pip install -U ultralytics

# Verify installation and check environment
yolo checks

The yolo checks command validates Python version, PyTorch, CUDA, GPU availability, and all dependencies. For detailed environment troubleshooting, see Environment Check or use the provided environment check script: python scripts/check_environment.py.

2. Basic Usage Examples

Python Interface

python

from ultralytics import YOLO

# Load a model (YOLO automatically infers task from model)
model = YOLO("yolo26n.pt")  # or your custom model path

# Predict on various sources
# By default, outputs are saved to workspace/yolo-vision folder
results = model("image.jpg")                     # image file → saved to yolo-vision/outputs/images/
results = model("video.mp4", stream=True)        # video with streaming → saved to yolo-vision/outputs/videos/
results = model("https://example.com/image.jpg") # URL → saved to yolo-vision/outputs/images/
results = model(0, show=True)                   # webcam with display → saved to yolo-vision/outputs/videos/

# Custom output directory (optional)
results = model("image.jpg", project="/custom/path")  # save to custom directory

CLI Interface

bash

# Basic syntax: yolo TASK MODE ARGS
# By default, outputs are saved to workspace/yolo-vision folder
yolo predict model=yolo26n.pt source="image.jpg"  # → saved to yolo-vision/runs/detect/predict/

# Task-specific examples
yolo detect predict model=yolo26n.pt source="video.mp4"  # → saved to yolo-vision/runs/detect/predict/
yolo segment predict model=yolo26n-seg.pt source="image.jpg"  # → saved to yolo-vision/runs/segment/predict/
yolo pose predict model=yolo26n-pose.pt source="image.jpg"  # → saved to yolo-vision/runs/pose/predict/

# Custom output directory (optional)
yolo predict model=yolo26n.pt source="image.jpg" project="/custom/path"  # save to custom directory

3. Model Selection

For quick start, use these default models:

Detection: yolo26n.pt (nano), yolo26s.pt (small), yolo26m.pt (medium)
Segmentation: yolo26n-seg.pt, yolo26s-seg.pt, yolo26m-seg.pt
Classification: yolo26n-cls.pt, yolo26s-cls.pt, yolo26m-cls.pt
Pose Estimation: yolo26n-pose.pt, yolo26s-pose.pt, yolo26m-pose.pt
Oriented Detection: yolo26n-obb.pt, yolo26s-obb.pt, yolo26m-obb.pt

For complete model list and selection guidance: Model Names | Model Selection

Core Workflow

Step 1: Understand YOLO Tasks

YOLO supports five main computer vision tasks. Choose the right task for your application:

Detection: Identify and localize objects with bounding boxes
Segmentation: Generate pixel-level masks for objects
Classification: Categorize entire images
Pose Estimation: Detect keypoints for pose analysis
Oriented Detection: Detect rotated objects with angle parameter

Detailed comparison: Task Types

Step 2: Select Appropriate Model

Consider these factors when selecting a model:

Speed vs. Accuracy: Nano (fastest) → X (most accurate)
Hardware Constraints: GPU memory, CPU performance
Application Requirements: Real-time vs. batch processing

Guidance: Model Selection

Step 3: Configure Parameters

Common configuration parameters:

conf: Confidence threshold (default: 0.25)
iou: IoU threshold for NMS (default: 0.7)
imgsz: Input image size (default: 640)
device: Device ID (0 for first GPU, cpu for CPU)
save: Save results to disk
show: Display results in real-time

Complete examples: Configuration Samples

Step 4: Process Results

YOLO returns Results objects containing:

boxes: Bounding boxes, confidence scores, class labels
masks: Segmentation masks (for segmentation tasks)
keypoints: Pose keypoints (for pose estimation)
probs: Classification probabilities (for classification)
obb: Oriented bounding boxes (for OBB tasks)

Advanced Topics

Training Custom Models

python

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n.pt")

# Train on custom dataset
results = model.train(data="dataset.yaml", epochs=100, imgsz=640)

Training guide: Training Basics | Dataset Preparation

Installation Options

Multiple installation methods available:

pip: pip install -U ultralytics
Conda: conda install -c conda-forge ultralytics
Docker: Pre-built images for GPU/CPU environments
From Source: For development and customization

Detailed instructions: Installation Guide

Performance Optimization

Streaming Mode: Use stream=True for videos/long sequences to reduce memory
Batch Processing: Process multiple images together for efficiency
Hardware Acceleration: Configure CUDA, TensorRT, or OpenVINO for optimal performance

Reference Documentation

Document	Description
Environment Check	Comprehensive environment validation and troubleshooting
Installation Guide	All installation methods (pip, Conda, Docker, source)
Task Types	Detailed comparison of YOLO tasks and use cases
Model Names	Complete YOLO26 model list with specifications
Model Selection	Strategy for choosing models based on requirements
Configuration Samples	Parameter configuration examples for various scenarios
Dataset Preparation	Guide for preparing custom datasets for training
Training Basics	Fundamentals of training YOLO models on custom data
Parameter Reference	Complete reference for all YOLO configuration parameters

Utility Scripts

To save token usage and provide ready-to-use tools, the following Python scripts are available in the scripts/ directory:

Script	Description	Usage Example
check_environment.py	Comprehensive environment diagnostics	`python scripts/check_environment.py`
config_templates.py	Ready-to-use configuration templates	`from scripts.config_templates import get_production_config`
dataset_tools.py	Dataset preparation and conversion tools	`from scripts.dataset_tools import coco_to_yolo`
training_helpers.py	Training, evaluation, and model management	`from scripts.training_helpers import evaluate_model`
quick_tests.py	Quick functionality tests	`python scripts/quick_tests.py --test environment`
model_utils.py	Model selection and validation utilities	`from scripts.model_utils import select_model`

Benefits of using scripts:

Save tokens: Large code blocks are extracted from documentation
Ready-to-use: No need to copy-paste code from documentation
Modular: Import only what you need
Maintainable: Scripts can be updated independently

Troubleshooting

Common Issues

Q: yolo command not found after installation? A: Try python -m ultralytics yolo or check Python environment PATH.

Q: How to use specific GPU? A: Set device=0 (first GPU) or device=cpu for CPU-only mode.

Q: Model downloads slowly? A: Set ULTRALYTICS_HOME environment variable to control cache location.

Q: How to filter specific classes? A: Use classes parameter: classes=[0, 2, 5] (class indices).

Q: Memory issues with long videos? A: Use stream=True to process videos as generators.

Q: Real-time webcam support? A: Yes, use source=0 (default camera) with show=True for live display.

Getting Help

Run yolo checks to diagnose environment issues
Check official documentation: https://docs.ultralytics.com
Review configuration reference: https://docs.ultralytics.com/usage/cfg/

Output Directory Convention

Default Output Location

When processing images or videos with YOLO, if the user does not specify an output directory, all generated files will be saved to the workspace's yolo-vision folder.

File Organization

The yolo-vision folder will be organized as follows:

text

yolo-vision/
├── inputs/            # Original input files (copied for reference)
├── outputs/           # Processed files with detection results
│   ├── images/        # Detected images
│   ├── videos/        # Detected videos  
│   └── previews/      # Preview images
├── reports/           # Analysis reports and statistics
│   ├── json/          # JSON format reports
│   ├── markdown/      # Markdown format reports
│   └── csv/           # CSV format data
├── models/            # Downloaded YOLO models
│   ├── yolo26/        # YOLO26 models
│   ├── yolo11/        # YOLO11 models
│   └── custom/        # Custom trained models
└── logs/              # Processing logs and debug information

Automatic Folder Creation

The skill will automatically:

Create the yolo-vision folder if it doesn't exist
Create all subdirectories as needed
Organize files by date and task type
Generate timestamp-based filenames for easy tracking

Example Usage

python

# Without specifying output directory - uses default yolo-vision folder
results = model("image.jpg")  # Output saved to yolo-vision/outputs/images/

# With custom output directory
results = model("image.jpg", save_dir="/custom/path")  # Uses specified path

Benefits

Consistency: All YOLO outputs in one predictable location
Organization: Files automatically categorized by type
Backup: Input files are preserved for reference
Reproducibility: Easy to find and compare previous analyses
Clean Workspace: Prevents clutter in the main workspace directory

User Override

Users can still specify custom output directories when needed:

By providing a save_dir parameter in Python code
By using the --project flag in CLI commands
By setting the ULTRALYTICS_PROJECT environment variable

License Note: Ultralytics YOLO is available under AGPL-3.0 for open source use and Enterprise License for commercial applications. Review licensing at https://ultralytics.com/license.