Pilot Data Labeling Pipeline Setup

v1.0.0

Deploy a data labeling pipeline with 4 agents for ingestion, auto-labeling, quality review, and dataset export. Use this skill when: 1. User wants to set up...

0· 84·0 current·0 all-time
byCalin Teodor@teoslayer

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for teoslayer/pilot-data-labeling-pipeline-setup.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Pilot Data Labeling Pipeline Setup" (teoslayer/pilot-data-labeling-pipeline-setup) from ClawHub.
Skill page: https://clawhub.ai/teoslayer/pilot-data-labeling-pipeline-setup
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: pilotctl, clawhub
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pilot-data-labeling-pipeline-setup

ClawHub CLI

Package manager switcher

npx clawhub@latest install pilot-data-labeling-pipeline-setup
Security Scan
Capability signals
Crypto
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the instructions: it installs pipeline-related companion skills via clawhub, sets hostnames with pilotctl, writes a pipeline manifest, and instructs mutual handshakes between agents. The required binaries (pilotctl, clawhub) are appropriate for this task.
Instruction Scope
Runtime instructions stay within pipeline setup: install listed companion skills, set hostname, write a JSON manifest to ~/.pilot/setups, and perform pilotctl handshakes. Note: the skill writes configuration under ~/.pilot and assumes the installed companion skills (e.g., pilot-s3-bridge) will be configured later — those companion skills may require additional credentials or network configuration that this skill does not request or configure.
Install Mechanism
This is instruction-only (no install spec, no code written by the skill). That is lowest-risk in terms of what the skill itself will drop on disk. The actual installs are delegated to clawhub/pilotctl, so review the provenance of those binaries before running.
Credentials
The skill declares no required environment variables or credentials, which is consistent with a generic setup template. Caveat: some of the companion skills it tells you to install (for example pilot-s3-bridge) will likely need cloud credentials (AWS keys) or endpoint configuration; those are not requested by this skill and must be supplied separately when you configure those components.
Persistence & Privilege
always is false and the skill is user-invocable. It writes a manifest under the user's home (~/.pilot/setups) which is reasonable for a setup helper and does not attempt to modify other skills or system-wide agent settings.
Assessment
This skill is a coherent setup guide, but before running any of its commands: 1) Verify the origin and integrity of the pilotctl and clawhub binaries you will run (they will perform installs and network actions). 2) Be prepared to provide credentials for connectors (e.g., S3) when configuring installed companion skills — the guide does not collect them itself. 3) Review the companion skills you will install (pilot-s3-bridge, pilot-webhook-bridge, etc.) because they may open network ports, require cloud creds, or reach external endpoints. 4) Limit network exposure (firewall ports like 1002) and perform handshakes only with trusted hostnames. If you need a higher-assurance review, provide the exact sources (URLs or package registry entries) for pilotctl, clawhub, and the companion skills to inspect them further.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspilotctl, clawhub
latestvk97bdc7a1amf3cpwjsphn0z5gh85a5yp
84downloads
0stars
1versions
Updated 5d ago
v1.0.0
MIT-0

Data Labeling Pipeline Setup

Deploy 4 agents that ingest raw data, apply ML labels, review quality, and export training-ready datasets.

Roles

RoleHostnameSkillsPurpose
ingester<prefix>-ingesterpilot-s3-bridge, pilot-stream-data, pilot-task-parallelAccepts raw data batches, splits into work items
labeler<prefix>-labelerpilot-task-router, pilot-dataset, pilot-metricsApplies ML-based labels to work items
reviewer<prefix>-reviewerpilot-review, pilot-event-filter, pilot-alertSamples labeled items, checks accuracy, flags disagreements
exporter<prefix>-exporterpilot-dataset, pilot-share, pilot-webhook-bridgePackages approved labels into training-ready datasets

Setup Procedure

Step 1: Ask the user which role this agent should play and what prefix to use.

Step 2: Install the skills for the chosen role:

# ingester:
clawhub install pilot-s3-bridge pilot-stream-data pilot-task-parallel
# labeler:
clawhub install pilot-task-router pilot-dataset pilot-metrics
# reviewer:
clawhub install pilot-review pilot-event-filter pilot-alert
# exporter:
clawhub install pilot-dataset pilot-share pilot-webhook-bridge

Step 3: Set the hostname:

pilotctl --json set-hostname <prefix>-<role>

Step 4: Write the setup manifest:

mkdir -p ~/.pilot/setups
cat > ~/.pilot/setups/data-labeling-pipeline.json << 'MANIFEST'
{
  "setup": "data-labeling-pipeline",
  "setup_name": "Data Labeling Pipeline",
  "role": "<ROLE_ID>",
  "role_name": "<ROLE_NAME>",
  "hostname": "<prefix>-<role>",
  "description": "<ROLE_DESCRIPTION>",
  "skills": { "<skill>": "<contextual description>" },
  "peers": [ { "role": "...", "hostname": "...", "description": "..." } ],
  "data_flows": [ { "direction": "send|receive", "peer": "...", "port": 1002, "topic": "...", "description": "..." } ],
  "handshakes_needed": [ "<peer-hostname>" ]
}
MANIFEST

Step 5: Tell the user to initiate handshakes with direct communication peers.

Manifest Templates Per Role

ingester

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"ingester","role_name":"Data Ingester","hostname":"<prefix>-ingester","description":"Accepts raw data batches from S3 or webhooks. Splits into work items and distributes.","skills":{"pilot-s3-bridge":"Pull raw data batches from S3 buckets on schedule or webhook trigger.","pilot-stream-data":"Stream work items to labeler as they are split from batches.","pilot-task-parallel":"Parallelize batch splitting across available workers."},"peers":[{"role":"labeler","hostname":"<prefix>-labeler","description":"Receives work items for labeling"}],"data_flows":[{"direction":"send","peer":"<prefix>-labeler","port":1002,"topic":"work-item","description":"Work items with raw data references"}],"handshakes_needed":["<prefix>-labeler"]}

labeler

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"labeler","role_name":"Auto Labeler","hostname":"<prefix>-labeler","description":"Applies ML-based labels, classifications, bounding boxes, or entity tags to work items.","skills":{"pilot-task-router":"Route work items to appropriate ML models by data type.","pilot-dataset":"Store and retrieve labeled data records.","pilot-metrics":"Track labeling throughput, model confidence distributions."},"peers":[{"role":"ingester","hostname":"<prefix>-ingester","description":"Sends work items for labeling"},{"role":"reviewer","hostname":"<prefix>-reviewer","description":"Receives labeled items for quality review"}],"data_flows":[{"direction":"receive","peer":"<prefix>-ingester","port":1002,"topic":"work-item","description":"Work items with raw data references"},{"direction":"send","peer":"<prefix>-reviewer","port":1002,"topic":"labeled-item","description":"Labeled items for quality review"},{"direction":"receive","peer":"<prefix>-reviewer","port":1002,"topic":"review-feedback","description":"Feedback on rejected labels for re-labeling"}],"handshakes_needed":["<prefix>-ingester","<prefix>-reviewer"]}

reviewer

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"reviewer","role_name":"Quality Reviewer","hostname":"<prefix>-reviewer","description":"Samples labeled items, checks accuracy, flags disagreements, computes inter-annotator agreement.","skills":{"pilot-review":"Score labeled items against quality criteria and flag disagreements.","pilot-event-filter":"Filter low-confidence labels for priority review.","pilot-alert":"Alert on quality drops or inter-annotator agreement below threshold."},"peers":[{"role":"labeler","hostname":"<prefix>-labeler","description":"Sends labeled items for review"},{"role":"exporter","hostname":"<prefix>-exporter","description":"Receives approved labels for export"}],"data_flows":[{"direction":"receive","peer":"<prefix>-labeler","port":1002,"topic":"labeled-item","description":"Labeled items for quality review"},{"direction":"send","peer":"<prefix>-labeler","port":1002,"topic":"review-feedback","description":"Feedback for re-labeling rejected items"},{"direction":"send","peer":"<prefix>-exporter","port":1002,"topic":"approved-label","description":"Approved labels ready for packaging"}],"handshakes_needed":["<prefix>-labeler","<prefix>-exporter"]}

exporter

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"exporter","role_name":"Dataset Exporter","hostname":"<prefix>-exporter","description":"Packages reviewed labels into training-ready datasets (COCO, VOC, JSONL). Publishes to storage.","skills":{"pilot-dataset":"Assemble labeled items into structured dataset formats.","pilot-share":"Upload packaged datasets to S3 or shared storage.","pilot-webhook-bridge":"Notify downstream consumers when datasets are published."},"peers":[{"role":"reviewer","hostname":"<prefix>-reviewer","description":"Sends approved labels for packaging"}],"data_flows":[{"direction":"receive","peer":"<prefix>-reviewer","port":1002,"topic":"approved-label","description":"Approved labels ready for packaging"},{"direction":"send","peer":"external","port":443,"topic":"dataset-published","description":"Notification that a new dataset is available"}],"handshakes_needed":["<prefix>-reviewer"]}

Data Flows

  • ingester -> labeler : work-item events (port 1002)
  • labeler -> reviewer : labeled-item events (port 1002)
  • reviewer -> labeler : review-feedback events (port 1002)
  • reviewer -> exporter : approved-label events (port 1002)
  • exporter -> external : dataset-published notifications (port 443)

Handshakes

# ingester <-> labeler:
pilotctl --json handshake <prefix>-labeler "setup: data-labeling-pipeline"
pilotctl --json handshake <prefix>-ingester "setup: data-labeling-pipeline"

# labeler <-> reviewer:
pilotctl --json handshake <prefix>-reviewer "setup: data-labeling-pipeline"
pilotctl --json handshake <prefix>-labeler "setup: data-labeling-pipeline"

# reviewer <-> exporter:
pilotctl --json handshake <prefix>-exporter "setup: data-labeling-pipeline"
pilotctl --json handshake <prefix>-reviewer "setup: data-labeling-pipeline"

Workflow Example

# On labeler — subscribe to work items:
pilotctl --json subscribe <prefix>-ingester work-item

# On ingester — publish a work item:
pilotctl --json publish <prefix>-labeler work-item '{"batch_id":"batch-042","item_id":"img-0017","type":"image","s3_uri":"s3://raw-data/batch-042/img-0017.jpg"}'

# On reviewer — subscribe to labeled items:
pilotctl --json subscribe <prefix>-labeler labeled-item

# On exporter — subscribe to approved labels:
pilotctl --json subscribe <prefix>-reviewer approved-label

Dependencies

Requires pilot-protocol skill, pilotctl binary, clawhub binary, and a running daemon.

Comments

Loading comments...