iaworker

v1.0.0

Intelligent Automation Worker — analyzes video/image streams and generates structured, real-time operating steps for physical tasks (debug, repair, assembly,...

1· 42·0 current·0 all-time
Security Scan
Capability signals
CryptoCan make purchases
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (visual analysis → step generation → TTS) matches the provided scripts (video_analyzer.py, step_engine.py, speaker.py). The code implements object/ anomaly detection, step generation and TTS/display delivery — all coherent with the stated purpose. Minor note: heavy ML libs (torch/transformers, cv2, PIL) are used even though the skill has no install spec; this is plausible for image analysis but should be declared to users.
Instruction Scope
SKILL.md and the scripts confine behavior to analyzing provided images/videos or a camera feed, producing steps, writing markdown output, and TTS playback. That is within scope. Important caveats: the analyzer will access local files and camera devices, write output files (markdown, temp audio), and run subprocesses to play audio. The _llm_ analysis is implemented locally as prompt templates (no external LLM call in the code shown), but the classifier pipeline may fetch models from the network (see install notes).
!
Install Mechanism
No install spec is provided. The code depends on sizable native libraries (opencv-python, pillow), optional torch/transformers (which will download models like 'microsoft/resnet-50' from the model hub at runtime if installed), and gTTS/pyttsx3 for audio. Those model downloads and gTTS network calls are implicit and not documented in SKILL.md; absence of an install block means a user may be surprised by large downloads, network traffic, or missing runtime dependencies.
Credentials
The skill requests no environment variables or credentials, which is proportionate. However, it uses gTTS (an online TTS client) by default in config.yaml which will make network calls to Google’s TTS service; transformers will pull models from Hugging Face if used. These network interactions are reasonable for the feature set but are not declared in the metadata and may be privacy-sensitive (image data uploaded to remote services via those libraries).
Persistence & Privilege
The skill is not always-included and uses normal agent invocation. It does not modify other skills or system-wide configs. It reads/writes files within its own directory and uses system devices (camera, audio) — this is expected given the functionality.
Assessment
This skill appears to do what it claims, but review these practical concerns before installing: (1) Dependencies: it relies on OpenCV, Pillow and optionally torch/transformers — the latter will download models (large) from the internet unless pre-cached. Add a proper install step and pinned package versions. (2) Network usage: gTTS will send text to an online Google TTS endpoint; transformers may fetch models from Hugging Face — if you need offline privacy, configure pyttsx3 and avoid the classifier pipeline or pre-download models. (3) Device access: the skill can read camera devices and write files / temporary audio; run it in a sandbox or a controlled environment if you have sensitive cameras or images. (4) Safety: generated repair instructions can be safety-critical; validate steps and do not rely solely on automated guidance for high-risk tasks. (5) Recommended actions: run code review or tests in an isolated virtualenv/container, set tts.enabled=false if you want no external TTS by default, and require the author to include an install spec that documents network calls, model sources, and exact dependencies.

Like a lobster shell, security has layers — review code before you run it.

latestvk97c8764sznbh13vjmgx2tzaws85artq
42downloads
1stars
1versions
Updated 22h ago
v1.0.0
MIT-0

iaworker — Intelligent Automation Worker

Analyze video/image streams, diagnose physical problems, and generate structured step-by-step operating guidance. Deliver instructions both visually (displayed markdown) and audibly (TTS spoken aloud).


Core Workflow

┌─────────────────────────────────────────────────────────────────────┐
│                        iaworker PROCESS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  [1] RECEIVE INPUT                                                   │
│      Video file path, image path, or live camera frame              │
│           ↓                                                          │
│  [2] ANALYZE (video_analyzer.py)                                     │
│      - Extract key frames                                             │
│      - Identify objects, damage, components                           │
│      - Detect anomaly patterns (cracks, loose parts, fluid leaks)   │
│      - Classify task type (repair / assembly / inspection / debug)   │
│           ↓                                                          │
│  [3] GENERATE STEPS (step_engine.py)                                 │
│      - Build ordered, numbered action steps                           │
│      - Include tool requirements, safety warnings                   │
│      - Flag prerequisite steps (disconnect power, etc.)             │
│      - Estimate difficulty/time for each step                       │
│           ↓                                                          │
│  [4] DELIVER (speaker.py + display)                                  │
│      - Display formatted markdown step guide                         │
│      - Speak each step aloud via TTS                                  │
│      - Step-by-step progression (not all at once)                    │
│      - Wait for user confirmation before advancing (configurable)    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Quick Start

Analyze an image and get spoken steps

python scripts/video_analyzer.py \
  --input /path/to/image.jpg \
  --task repair \
  --lang en \
  --speak

Analyze a video and get per-segment steps

python scripts/video_analyzer.py \
  --input /path/to/video.mp4 \
  --task debug \
  --lang en \
  --speak \
  --step-by-step

Analyze from camera feed (live)

python scripts/video_analyzer.py \
  --input camera \
  --task inspection \
  --lang en \
  --speak \
  --live

Scripts

video_analyzer.py

Entry point. Analyzes visual input and triggers step generation.

python scripts/video_analyzer.py [options]

Options:

FlagDescriptionDefault
--input PATHImage path, video path, or camera for liveRequired
--task TYPErepair, debug, assembly, inspection, autoauto
--lang CODEen or zhen
--speakEnable TTS for step outputDisabled
--step-by-stepSpeak and display one step at a time, wait for confirmationSequential mode
--liveLive camera mode with continuous analysisOff
--output PATHWrite steps to markdown fileNone (console only)
--frame-skip NSkip every N frames in video (speed up analysis)10

Task auto-detection:

  • repair — Something is broken; find damage, suggest fixes
  • debug — Something isn't working; trace fault to cause
  • assembly — Something needs to be built/put together
  • inspection — Check condition, report findings

step_engine.py

Generates structured steps from analysis results.

from step_engine import StepEngine

engine = StepEngine(lang="en")
steps = engine.generate(
    task_type="repair",
    objects=["wheel", "chain", "brake caliper"],
    anomalies=["chain loose", "brake pad worn"],
    context={"bike_type": "mountain"}
)

for step in steps:
    print(step["number"], step["title"])
    print(step["description"])
    print(f"[Tools: {step['tools']}] [Time: {step['time_estimate']}]")
    if step["safety_warning"]:
        print(f"⚠️  {step['safety_warning']}")

Step object schema:

{
    "number": int,              # 1-based step number
    "title": str,               # Short action title
    "description": str,         # Detailed description
    "tools": list[str],         # Required tools
    "time_estimate": str,       # e.g. "5-10 min"
    "difficulty": str,          # "easy" | "medium" | "hard" | "expert"
    "safety_warning": str|null,# Warning text if any
    "prerequisite": bool,       # Must be done before others proceed
    "common_mistakes": list[str],# What to avoid
}

Difficulty classification:

LevelIndicator
easyNo special tools, minimal risk
mediumBasic tools, some disassembly
hardSpecialty tools, significant disassembly
expertProfessional tools, structural risk

speaker.py

Handles TTS output and markdown display.

from speaker import Speaker

speaker = Speaker(lang="en", tts_enabled=True)

speaker.display_and_speak("Step 1: Inspect the chain tensioner")
speaker.display_steps([...steps...])
speaker.speak_only("Make sure to wear safety glasses.")
speaker.wait_for_user("Press Enter when ready to continue")

Features:

  • gtts (Google TTS) — default, works out of the box
  • pyttsx3 — offline fallback
  • Markdown rendering in terminal with rich library
  • Per-step speak with configurable pacing
  • Confirmation gating between steps (for --step-by-step mode)

Step Generation Guidelines

Steps must follow this structure:

  1. Prerequisites — Things that must be done first (disconnect power, secure object, etc.)
  2. Assessment — Inspect and confirm the problem
  3. Preparation — Gather tools, clear workspace
  4. Main actions — Numbered, one clear action per step
  5. Verification — Test that the fix/assembly worked
  6. Cleanup — Put back together, tidy tools

Rules:

  • Each step = one action. If it has "and", it's two steps.
  • Always include a safety check step after anything involving power, hot parts, or fluids.
  • Difficulty and time estimate must be realistic.
  • Flag the most common mistakes for each step.

Configuration

Config file: scripts/config.yaml

tts:
  engine: "gtts"          # "gtts" or "pyttsx3"
  lang: "en"
  speed: 1.0              # 0.5 = slow, 2.0 = fast
  volume: 1.0             # 0.0 to 1.0

display:
  use_rich: true          # Pretty terminal output
  color: "cyan"           # Step highlight color
  show_icons: true        # Show ✅ ⚠️ 🔧 icons

analysis:
  default_task: "auto"
  frame_skip: 10
  confidence_threshold: 0.6

step_delivery:
  auto_speak: true
  wait_confirmation: false
  speak_difficulty: true
  speak_time_estimate: true

Task Reference

Bike Repair — Chain Adjustment

🔧 Tools: Hex keys (4mm, 5mm), chain tool, lubricant
⏱ Time: 15-25 min
⚠️ Safety: Flip bike first — chain tension releases can snap
  1. Flip bike, rest on seat and handlebars
  2. Inspect chain for stiff links, rust, kinks
  3. Loosen rear axle bolts (5mm hex)
  4. Adjust chain tension via horizontal dropouts
  5. Check tension: 10-15mm deflection at midpoint
  6. Re-tighten axle bolts
  7. Lubricate if needed, wipe excess
  8. Test ride

Car Debug — Engine Won't Start

🔧 Tools: OBD2 scanner, multimeter, basic socket set
⏱ Time: 20-40 min (diagnosis first)
⚠️ Safety: Disable ignition, disconnect battery negative first
  1. Check if fuel pump primes (turn key to ON, listen)
  2. Test battery voltage (>12.4V idle, >13.5V running)
  3. Connect OBD2 scanner, read fault codes
  4. Inspect spark plugs for gap/damage
  5. Check for crank/cam position sensor signals
  6. Verify immobilizer status
  7. Narrow to most likely cause, then address

Generic Assembly — IKEA-style

🔧 Tools: Hex key (included), Phillips screwdriver, hammer
⏱ Time: varies
⚠️ Safety: Enlist a second person for large panels
  1. Unpack and sort all hardware (count screws, dowels)
  2. Lay out all panels, identify front/back
  3. Pre-assemble sub-groups before final join
  4. Hand-tighten all screws first
  5. Use cardboard to protect floors
  6. Final torque pass after 24h

Troubleshooting

"No audio output"

  • Check if gtts is installed: pip install gtts
  • Fallback: engine: pyttsx3 in config (offline)
  • On headless servers: set DISPLAY env var or use pyttsx3

"Analysis is slow on video"

  • Increase --frame-skip (e.g., --frame-skip 30)
  • Use --input camera --live for real-time with throttled analysis

"Steps are too generic"

  • Provide more context in the initial prompt
  • Use --task repair explicitly if auto-detect fails
  • For specialized equipment, the LLM analysis quality depends on prompt specificity

"OpenCV camera not found"

  • Check camera index: python scripts/video_analyzer.py --input camera --list-devices
  • Try --input camera --camera-index 1 if default is wrong

Extending for Specific Domains

iaworker ships with general-purpose analysis. To add domain-specific knowledge:

  1. Create references/domains/MYDOMAIN.md with known failure modes and tool lists
  2. In step_engine.py, add a DOMAIN_HANDLERS map that loads these
  3. The step engine will then reference domain files when generating steps

Example domain file:

# Domain: electric_bike

## Common Failures
- Motor controller overheating → reduce load, check ventilation
- Battery BMS cutout → reset via unplugging 30s
- Torque sensor miscalibration → re-zero via display menu

## Safety
- Never open motor housing — high voltage capacitors retain charge
- Battery must be removed before any repair

Comments

Loading comments...