Kami Package Detection

Other

A free skill by Kami SmartHome. Continuously monitors RTSP camera streams for packages, parcels, and bags using YOLO-World ONNX. Smart deduplication only triggers alerts when a genuinely new or moved package appears.

Install

openclaw skills install kami-package-detection

Kami Package Detection

Continuously monitors your camera and sends instant notifications when a new package arrives.

Continuously monitors RTSP camera streams for packages, parcels, backpacks, and suitcases. People and handbags are recognized by the model but suppressed at the alert layer to cut down on false alarms. When a new package is detected (position/size significantly different from the last alert), sends push notifications. Static frames are automatically skipped to save compute.

Features

  • 📦 Continuous package & parcel monitoring (not one-shot)
  • 🔔 Push notifications via Feishu / Telegram / Discord
  • 🧠 Smart deduplication — only alerts for new or moved packages (IoU + area change), with a 24-hour tracking window to silence repeated alerts on the same parcel
  • ⚡ Static frame filtering — skips inference when camera scene is unchanged
  • 📷 Multi-camera support with independent background processes
  • 🧳 Suitcase / backpack recognition
  • 🏠 Doorstep & reception monitoring

Scenarios

  • Doorstep delivery waiting
  • Office reception package management
  • Warehouse cargo monitoring
  • Temporary item watch

Installation

bash setup.sh

Creates .venv/ and installs onnxruntime, opencv-python-headless, numpy, requests. Idempotent.

Prerequisites

  • python3 and python3-venv installed
  • RTSP camera(s) online and reachable
  • Internet access on first run (to download yolov8s-worldv2.pt if not bundled)

Model

The yolov8s-worldv2.onnx model file is auto-prepared by setup.sh using a download-first, export-fallback strategy:

  1. If yolov8s-worldv2.onnx is already present → reused as-is.
  2. Otherwise, setup.sh downloads the pre-built archive kami-package-detection.zip from https://publicfiles.xiaoyi.com/kami-package-detection.zip and extracts yolov8s-worldv2.onnx out of it (fast path, no extra dependencies).
  3. If the download or extraction fails (offline / mirror unreachable), setup.sh falls back to installing ultralytics into the venv (one-time, ~500 MB with torch) and runs export_model.py, which loads yolov8s-worldv2.pt (auto-downloaded by Ultralytics if absent), injects the custom vocabulary via set_classes(), and exports to ONNX with imgsz=320.

Manual export / re-export:

# Re-export even if the ONNX already exists
.venv/bin/python export_model.py --force

# Custom image size
.venv/bin/python export_model.py --imgsz 320

If you change the class list, edit CLASS_NAMES in both export_model.py and DEFAULT_CLASS_NAMES in yolo_world_onnx.py to keep them in sync (same order, same length), then re-export with --force.

Parameter Confirmation

Parameters can be supplied via either config.json (recommended for repeated use) or command-line flags. Command-line flags override config.json, which overrides built-in defaults.

Parameterconfig.json fieldDefaultDescription
--device(selects from cameras array)first cameraTarget camera DEVICE_ID
--rtsp_urlcameras[].rtsp_urlRTSP camera URL (overrides camera selection)
--conf_thresholdconf_threshold0.25Confidence threshold (0.0–1.0)
--class_names(not in config.json)parcel package "delivery box" person "Cardboard box" "Packaging Box" backpack handbag suitcaseClasses to detect (CLI only)
--run_timerun_time0Max seconds; 0 = unlimited (continuous monitoring)
--start-detectStart background detection (all cameras or --device)
--stop-detectStop background detection (all cameras or --device)
--statusCheck detection process status
--list-devicesList all configured cameras and exit
alarm_cooldown60Min seconds between notifications for different packages
feishu_webhook_urlFeishu Webhook URL for push notifications
telegram_bot_tokenTelegram Bot token
telegram_chat_idTelegram chat ID
discord_webhook_urlDiscord Webhook URL
discord_bot_tokenDiscord Bot token
discord_channel_idDiscord channel ID

Multi-Camera Configuration

config.json supports a cameras array for multiple cameras:

{
  "cameras": [
    {
      "rtsp_url": "rtsp://192.168.1.100/stream",
      "device_id": "CAM-FRONT"
    },
    {
      "rtsp_url": "rtsp://192.168.1.101/stream",
      "device_id": "CAM-BACK",
      "conf_threshold": 0.3
    }
  ],
  "conf_threshold": 0.25,
  "run_time": 0,
  "alarm_cooldown": 60,
  "feishu_webhook_url": "",
  "telegram_bot_token": "",
  "telegram_chat_id": "",
  "discord_webhook_url": ""
}
  • device_id must be unique across all cameras
  • Per-camera conf_threshold and run_time override global values
  • Without --device, all cameras are started/stopped together
  • Each camera runs as an independent background process
  • Legacy single-camera config (flat rtsp_url at top level) is still supported

Common Brand RTSP Templates

MUST show this table to the user when configuring cameras, so they can pick a URL pattern based on their brand:

Brand keyBrandURL pattern
hikvisionHikvisionrtsp://{user}:{pwd}@{ip}:554/Streaming/Channels/101 (101=ch1 main, 102=ch1 sub)
dahuaDahuartsp://{user}:{pwd}@{ip}:554/cam/realmonitor?channel=1&subtype=0 (subtype=0 main, 1 sub)
tplinkTP-Linkrtsp://{user}:{pwd}@{ip}:554/stream1 (stream1 main, stream2 sub)
ezvizEZVIZrtsp://admin:{verify_code}@{ip}:554/H264/ch1/main/av_stream
univiewUniviewrtsp://{user}:{pwd}@{ip}:554/media/video1
reolinkReolinkrtsp://{user}:{pwd}@{ip}:554/h264Preview_01_main

Ask the user: do any parameters need to be changed?

Usage

Start Detection (Background)

# Start all cameras
.venv/bin/python yolo_world_onnx.py --start-detect

# Start a specific camera
.venv/bin/python yolo_world_onnx.py --start-detect --device CAM-FRONT

Stop Detection

# Stop all cameras
.venv/bin/python yolo_world_onnx.py --stop-detect

# Stop a specific camera
.venv/bin/python yolo_world_onnx.py --stop-detect --device CAM-FRONT

Check Status

# Status of all cameras
.venv/bin/python yolo_world_onnx.py --status

# Status of a specific camera
.venv/bin/python yolo_world_onnx.py --status --device CAM-FRONT

Single-Run Mode (Foreground)

# Run continuous monitoring on a specific camera (foreground)
.venv/bin/python yolo_world_onnx.py --device CAM-FRONT

# Override via CLI (runs for 120 seconds then stops)
.venv/bin/python yolo_world_onnx.py \
  --rtsp_url rtsp://your-camera-address \
  --run_time 120

# List configured cameras
.venv/bin/python yolo_world_onnx.py --list-devices

Output (stdout JSON)

When a new package is detected, outputs an alarm JSON to stdout:

{
  "alarm": true,
  "type": "package",
  "class_name": "parcel",
  "confidence": 0.87,
  "camera_name": "CAM-FRONT",
  "frame": 1523,
  "snapshot": "/path/to/snapshots/CAM-FRONT/20260604_153012_482.jpg",
  "detections": [
    {
      "class_name": "parcel",
      "bbox": {"x1": 100, "y1": 200, "x2": 300, "y2": 400}
    }
  ]
}
FieldTypeDescription
alarmboolAlways true for alarm output
typestringAlways "package"
class_namestringDetected object class
confidencefloatDetection confidence (0.0–1.0)
camera_namestringSource camera device_id
frameintFrame number when detected
snapshotstringAbsolute path to the annotated JPG (with bounding box drawn)
bbox.x1, y1, x2, y2intBounding box coordinates

Exit Codes

CodeMeaning
0Normal exit (run_time reached or manual stop via signal)
1Error (model missing, RTSP failure, runtime exception)

Troubleshooting

  • bash: .venv/bin/python: No such file or directory → Run bash setup.sh
  • Model file not found → Place yolov8s-worldv2.onnx in the skill directory
  • Cannot open video → Check camera is online and --rtsp_url is correct

Privacy Notice

This skill processes camera video stream frames for object detection. Please review the following privacy information before use:

Pure Local Inference

  • Detection runs entirely on-device via the YOLOv8-World ONNX model — no cloud API calls for inference
  • The only outbound traffic is: RTSP pull from your camera (LAN) + notification push to your configured channels (Feishu / Telegram / Discord)

Local Data Storage

  • Frames are held in memory only and discarded after each inference — nothing is persisted to disk by default
  • When an alarm fires, the annotated frame is saved as a JPEG under snapshots/<camera_device_id>/ for evidence; nothing else is persisted
  • The skill emits alarm JSON objects to stdout; if you need history, the caller is responsible for storing it

Notification Channels

  • Push notifications are sent only when configured (all channels are optional)
  • Notification content includes: detected class name, confidence, camera name, and timestamp
  • No images or video frames are sent in notifications

User Control

  • Camera URL is supplied by the user; this skill will not auto-discover or connect to cameras
  • You can stop the skill at any time via --stop-detect or SIGTERM
  • Removing the skill directory wipes everything (model file + venv); nothing else is touched on the host

For more details on our privacy policy, visit: https://kamiclaw-skill.kamihome.com/privacy