OpenClaw VLN Planner

v1.0.0

Plan the next high-level navigation step for a robot from a user navigation instruction, one current image, and a sequence of historical images. Use when the...

⭐ 0· 123·0 current·0 all-time

by@tiktokdad

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for tiktokdad/openclaw-vln-planner.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "OpenClaw VLN Planner" (tiktokdad/openclaw-vln-planner) from ClawHub.
Skill page: https://clawhub.ai/tiktokdad/openclaw-vln-planner
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install openclaw-vln-planner

ClawHub CLI

Package manager switcher

npx clawhub@latest install openclaw-vln-planner

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description, SKILL.md, config, and vln_bridge.py all align: the planner builds a prompt from historical+current images and an instruction, queries a multimodal model, parses JSON, validates bounds, and forwards a mid-level action. Network access to a model and reading image files are expected for this purpose.

ℹ

Instruction Scope

Runtime instructions and the Python bridge explicitly read image files, load a YAML config containing model base_url/api_key/model_id, and send base64-encoded images to the configured OpenAI-compatible gateway. This behavior is necessary for a multimodal planner but means camera frames (potentially sensitive) are transmitted to an external service. SKILL.md and code limit outputs to pure JSON and define safety fallbacks.

✓

Install Mechanism

This is an instruction-only skill with a small example Python bridge; there is no install spec, no external downloads, and only a minimal requirements.txt (requests, PyYAML). No extraction from arbitrary URLs or package installs are present.

ℹ

Credentials

The package does not declare required env vars or primary credentials, but the bridge requires a model base_url and api_key in a YAML config file (config/vln-config.yaml). Expect to provide credentials to the model gateway; that is proportional to the task but the manifest omission of a required config/credential declaration is a small inconsistency to be aware of.

✓

Persistence & Privilege

The skill does not request persistent/system privileges, does not set always:true, and has no install actions that modify other skills or system-wide settings. The bridge runs as a standalone script and prints dry-run execution by default.

Assessment

This skill appears to do what it says: it will read local camera frame files, base64-encode them, and POST them (with the model API key from a YAML config) to whatever OpenAI-compatible gateway you configure. Before installing: (1) confirm you trust the gateway endpoint and operator because camera images and any scene data will be transmitted; (2) store the API key securely (the example uses a config file rather than env vars) and update the manifest if you need policy/audit visibility; (3) keep executor in dry_run while testing and review/replace the placeholder execute_* functions so the planner cannot command hardware until you've integrated a vetted execution bridge; (4) if you require stricter telemetry controls, inspect/modify image_to_data_url and build_messages to avoid sending raw images or to anonymize them. The small manifest omission (no declared required config path/credential) is not malicious but worth correcting for clarity and safety.

Like a lobster shell, security has layers — review code before you run it.

latestvk97e0e2ba2s5jh60cpk3b42ksx83j9as

123downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

OpenClaw VLN Planner

Use this skill when the user wants a robot to follow a natural-language navigation instruction from visual observations.

This skill is a high-level navigation planner. It does not produce motor, joint, torque, or trajectory control. It only produces one structured mid-level navigation action at a time.

When this skill triggers

Trigger this skill when the task includes one or more of the following:

Vision-language navigation (VLN)
Robot next-step planning from camera images
Closed-loop navigation with replanning after each observation
Converting a current frame plus historical frames into a single next navigation action
Sending current + history images to an OpenAI-compatible multimodal gateway for action prediction

Required inputs

The planner expects:

user_instruction: natural-language navigation instruction
current_frame: exactly one current image
history_frames: zero or more previous images in temporal order

Optional inputs:

robot_state: heading, speed, pose estimate, execution feedback, etc.
safety_flags: blocked, collision_risk, lost, target_reached, low_visibility, etc.
config_path: path to the runtime config file

Output contract

Output must be pure JSON only. Do not prepend or append prose.

Allowed action types only:

MOVE_FORWARD
TURN_LEFT
TURN_RIGHT
STOP

Expected JSON shape:

{
  "next_action": {
    "type": "MOVE_FORWARD",
    "value": 75,
    "unit": "cm"
  },
  "task_status": "in_progress",
  "confidence": 0.87,
  "notes": "continue along the hallway"
}

Completion example:

{
  "next_action": {
    "type": "STOP"
  },
  "task_status": "completed",
  "confidence": 0.93,
  "notes": "goal reached"
}

Core rules

Plan only the next action.
Never output a full route.
Replan after each execution step.
If uncertain, unsafe, blocked, unable to parse, or visually ambiguous, output STOP.
Enforce action bounds:
- MOVE_FORWARD: 10-150 cm
- TURN_LEFT: 5-90 deg
- TURN_RIGHT: 5-90 deg
- STOP: no value/unit required
If safety_flags.target_reached == true, output STOP with task_status = completed.
If blocked, collision_risk, lost, or severe uncertainty is present, prefer STOP.

Runtime configuration

Before running, load a YAML config file such as config/vln-config.yaml.

The config should define:

subscribed or logical input topics / channels for current frame and history frame collection
optional robot state and safety flag sources
OpenAI-compatible multimodal gateway settings: base_url, api_key, model_id
planner behavior such as confidence threshold and safety fallback
executor bridge mode (default: Python function bridge)

Read references/navigation-schema.md for the expected config structure.

Internal module design

1) context builder

Build a model input payload from:

user instruction
historical observations
current observation
optional robot state
optional safety flags

The prompt must explicitly separate:

historical observations
current observation
user instruction

2) action planner

Call an OpenAI-compatible multimodal gateway with:

one current image
historical images
planner prompt
optional structured context

The model should be asked to return pure JSON for exactly one next action.

3) action parser

Parse the model result as JSON.

If parsing fails:

try safe extraction of the first JSON object
if still invalid, fall back to STOP

4) action validator

Validate:

action type is one of the four allowed values
distance and angle ranges are legal
unit matches action type
confidence is numeric if present
task_status is one of in_progress, completed, failed

Any invalid output falls back to STOP.

5) executor bridge

Forward the validated mid-level action to a separate execution layer.

Reserved Python bridge interface:

execute_move_forward(distance_cm)
execute_turn_left(angle_deg)
execute_turn_right(angle_deg)
execute_stop()
get_robot_state()
get_safety_flags()

Do not hardcode a robot SDK into the planner logic.

6) replanning loop

Use the planner in a closed loop:

gather current frame + history frames
gather optional robot state / safety flags
call multimodal planner
parse and validate JSON action
execute through bridge
observe again
repeat until task_status = completed or forced stop

7) safety fallback

Always stop on:

parse failure
invalid action
confidence below threshold
blocked / collision risk / lost / target reached
missing visual evidence for safe motion

Prompt template

Use this prompt pattern:

You are a robot navigation planner.
You will receive:
1. historical observations
2. current observation
3. a user instruction
4. optional robot state and safety flags

Your job is to decide the robot's next single mid-level navigation action.
You may output only one of these actions:
- MOVE_FORWARD with distance in cm
- TURN_LEFT with angle in deg
- TURN_RIGHT with angle in deg
- STOP

Rules:
- Plan only the next step, not the whole route.
- If the goal has been reached, output STOP.
- If you are uncertain, the scene is unclear, or there is any safety risk, output STOP.
- MOVE_FORWARD must be 10-150 cm.
- TURN_LEFT and TURN_RIGHT must be 5-90 deg.
- Output pure JSON only, with no extra explanation.

Example user requests

"Go down the hallway and stop at the blue door."
"Move to the kitchen entrance."
"Find the end of the corridor and stop."
"Turn right at the next intersection and continue."

Failure handling

If anything is wrong with the output, return:

{
  "next_action": {
    "type": "STOP"
  },
  "task_status": "failed",
  "confidence": 0.0,
  "notes": "fallback_stop"
}

Bundled resources

references/navigation-schema.md: schema, bounds, safety fallback, examples, config contract
scripts/vln_bridge.py: example OpenAI-compatible multimodal planner + Python executor bridge
scripts/requirements.txt: Python dependencies
config/vln-config.yaml: runtime config template

Comments

Loading comments...