Perceptron

v1.0.1

Image and video analysis powered by Isaac vision models. Capabilities include visual Q&A, object detection, OCR, captioning, counting, and grounded spatial r...

2· 281·0 current·0 all-time
bySubraiz Ahmed@subraiz
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, the included CLI, SKILL.md, and the single required env var (PERCEPTRON_API_KEY) all align with a vision-analysis SDK that calls an external Perceptron service.
Instruction Scope
Runtime instructions and the CLI operate only on image/video files, use the Perceptron SDK, and optionally read/write local files for batch processing or output. The documentation does not instruct reading unrelated system files or credentials. Note: using the skill will upload images/frames to the Perceptron service, so any sensitive image data will be transmitted externally.
Install Mechanism
There is no registry install spec; SKILL.md instructs pip install perceptron (standard). The skill includes a Python CLI that depends on that SDK and optional Pillow. No downloads from untrusted URLs or archive extraction are present.
Credentials
Only PERCEPTRON_API_KEY (primary credential) is required, which is appropriate for an external API client. No unrelated credentials or config paths are requested.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request persistent system-wide privileges or modify other skills. Note that autonomous invocation (the platform default) would allow the agent to call the Perceptron API using the provided key, which is expected behavior but something to be aware of.
Assessment
This package appears coherent for image/video analysis, but before installing: 1) Understand that images/frames you analyze will be sent to the Perceptron API (api.perceptron.inc is referenced in docs); do not upload sensitive images unless you trust the provider. 2) The skill requires an API key (PERCEPTRON_API_KEY) — grant only a key with the minimum needed scope and rotate it if exposed. 3) The README instructs pip install perceptron — verify the perceptron package on PyPI and the official docs (https://docs.perceptron.inc/) to ensure you're installing the intended SDK. 4) Batch/CLI operations read/write local files and can process directories; be careful when running with paths that contain sensitive files. 5) If you allow the agent to invoke skills autonomously, remember it can use the provided API key to call the service; restrict autonomous use if that is a concern.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

EnvPERCEPTRON_API_KEY
Primary envPERCEPTRON_API_KEY
latestvk9721h1tg1y4vqbdxrncss32js82dkx2
281downloads
2stars
2versions
Updated 1mo ago
v1.0.1
MIT-0

Perceptron — Vision SDK

Docs: https://docs.perceptron.inc/

Image and video analysis via the Perceptron Python SDK. Pass file paths or URLs directly — the SDK handles base64 conversion automatically.

Setup

pip install perceptron
export PERCEPTRON_API_KEY=ak_...

Quick Reference

TaskFunctionExample
Describe / Q&Aquestion()question("photo.jpg", "What's in this image?")
Grounded Q&Aquestion()question("photo.jpg", "Where is the cat?", expects="box")
Object detectiondetect()detect("photo.jpg", classes=["person", "car"])
OCRocr()ocr("document.png")
OCR (markdown)ocr_markdown()ocr_markdown("document.png")
Captioncaption()caption("photo.jpg", style="detailed")
Countingquestion()question("photo.jpg", "How many dogs?", expects="point")
Custom workflow@perceiveSee DSL composition below

Python SDK

from perceptron import configure, detect, caption, ocr, ocr_markdown, question

# Configuration (or set PERCEPTRON_API_KEY env var)
configure(provider="perceptron", api_key="ak_...")

# Visual Q&A — the most common operation
result = question("photo.jpg", "What's happening in this image?")
print(result.text)

# Grounded Q&A — get bounding boxes with answers
result = question("photo.jpg", "Where is the damage?", expects="box")
for box in result.points or []:
    print(f"{box.mention}: ({box.top_left.x},{box.top_left.y}) → ({box.bottom_right.x},{box.bottom_right.y})")

# Object detection
result = detect("warehouse.jpg", classes=["forklift", "person"])
for box in result.points or []:
    print(f"{box.mention}: ({box.top_left.x},{box.top_left.y}) → ({box.bottom_right.x},{box.bottom_right.y})")

# OCR
result = ocr("receipt.jpg", prompt="Extract the total amount")
print(result.text)

result = ocr_markdown("document.png")  # structured markdown output
print(result.text)

# Captioning
result = caption("scene.png", style="detailed")
print(result.text)

DSL Composition (Advanced)

Build custom multimodal workflows:

from perceptron import perceive, image, text, system

@perceive(expects="box", model="isaac-0.2-2b-preview")
def find_hazards(img_path):
    return [system("<hint>BOX</hint>"), image(img_path), text("Locate all safety hazards")]

result = find_hazards("factory.jpg")

Structured Outputs

Constrain responses to Pydantic models, JSON schemas, or regex:

from perceptron import perceive, image, text, pydantic_format
from pydantic import BaseModel

class Scene(BaseModel):
    objects: list[str]
    count: int

@perceive(response_format=pydantic_format(Scene))
def count_objects(path):
    return image(path) + text("List all objects and count them. Return JSON.")

result = count_objects("photo.jpg")
scene = Scene.model_validate_json(result.text)

Pixel Coordinate Conversion

All spatial outputs use normalized coordinates (0–1000). Convert to pixels:

pixel_boxes = result.points_to_pixels(width=1920, height=1080)

# Or standalone:
from perceptron import scale_points_to_pixels
pixel_pts = scale_points_to_pixels(result.points, width=1920, height=1080)

CLI Script

Located at: <skill-dir>/scripts/perceptron_cli.py

Requires PERCEPTRON_API_KEY environment variable. The provider is always perceptron.

P=<skill-dir>/scripts/perceptron_cli.py

# Visual Q&A
python3 $P question photo.jpg "What do you see?"
python3 $P question photo.jpg "Where is the car?" --expects box

# Object detection
python3 $P detect photo.jpg --classes person,car
python3 $P detect photo.jpg --classes forklift --format json --pixels
python3 $P detect ./frames/ --classes defect  # batch directory

# OCR
python3 $P ocr document.png
python3 $P ocr receipt.jpg --output markdown

# Captioning
python3 $P caption scene.png --style detailed

# Custom perceive
python3 $P perceive frame.png --prompt "Describe this scene" --expects box

# Batch processing
python3 $P batch --images img1.jpg img2.jpg --prompt "Describe" --output results.json

# Parse raw model output
python3 $P parse "<point_box ...>" --mode points

# List models
python3 $P models

Models

ModelBest forSpeedTemp
isaac-0.2-2b-preview (default)General use, detection, OCRFast0.0
isaac-0.2-1bQuick/simple tasksFastest0.0

Override with model="..." in any SDK call or --model ... in CLI.

Grounding (expects parameter)

ValueReturnsUse case
text (default)Plain textQ&A, descriptions, OCR
boxBounding boxesDetection, localization
pointPoint coordinatesCounting, pointing
polygonPolygon verticesSegmentation

Video Analysis

Extract frames with ffmpeg, then analyze:

# Single frame at 5 seconds
ffmpeg -ss 5 -i video.mp4 -frames:v 1 -q:v 2 /tmp/frame.jpg

# Then analyze
python3 $P question /tmp/frame.jpg "What's happening?"

For continuous monitoring, extract multiple frames and batch process.

Reference Files

For deeper SDK usage, consult these when needed:

Comments

Loading comments...