Install
openclaw skills install @tiktokdad/openclaw-vln-plannerPlan the next high-level navigation step for a robot from a user navigation instruction, one current image, and a sequence of historical images. Use when the task is vision-language navigation, closed-loop replanning, multimodal next-action prediction, or converting visual observations into a single structured JSON navigation action for an OpenAI-compatible multimodal gateway and a separate execution bridge.
openclaw skills install @tiktokdad/openclaw-vln-plannerUse this skill when the user wants a robot to follow a natural-language navigation instruction from visual observations.
This skill is a high-level navigation planner. It does not produce motor, joint, torque, or trajectory control. It only produces one structured mid-level navigation action at a time.
Trigger this skill when the task includes one or more of the following:
The planner expects:
user_instruction: natural-language navigation instructioncurrent_frame: exactly one current imagehistory_frames: zero or more previous images in temporal orderOptional inputs:
robot_state: heading, speed, pose estimate, execution feedback, etc.safety_flags: blocked, collision_risk, lost, target_reached, low_visibility, etc.config_path: path to the runtime config fileOutput must be pure JSON only. Do not prepend or append prose.
Allowed action types only:
MOVE_FORWARDTURN_LEFTTURN_RIGHTSTOPExpected JSON shape:
{
"next_action": {
"type": "MOVE_FORWARD",
"value": 75,
"unit": "cm"
},
"task_status": "in_progress",
"confidence": 0.87,
"notes": "continue along the hallway"
}
Completion example:
{
"next_action": {
"type": "STOP"
},
"task_status": "completed",
"confidence": 0.93,
"notes": "goal reached"
}
STOP.MOVE_FORWARD: 10-150 cmTURN_LEFT: 5-90 degTURN_RIGHT: 5-90 degSTOP: no value/unit requiredsafety_flags.target_reached == true, output STOP with task_status = completed.blocked, collision_risk, lost, or severe uncertainty is present, prefer STOP.Before running, load a YAML config file such as config/vln-config.yaml.
The config should define:
base_url, api_key, model_idRead references/navigation-schema.md for the expected config structure.
Build a model input payload from:
The prompt must explicitly separate:
Call an OpenAI-compatible multimodal gateway with:
The model should be asked to return pure JSON for exactly one next action.
Parse the model result as JSON.
If parsing fails:
STOPValidate:
in_progress, completed, failedAny invalid output falls back to STOP.
Forward the validated mid-level action to a separate execution layer.
Reserved Python bridge interface:
execute_move_forward(distance_cm)execute_turn_left(angle_deg)execute_turn_right(angle_deg)execute_stop()get_robot_state()get_safety_flags()Do not hardcode a robot SDK into the planner logic.
Use the planner in a closed loop:
task_status = completed or forced stopAlways stop on:
Use this prompt pattern:
You are a robot navigation planner.
You will receive:
1. historical observations
2. current observation
3. a user instruction
4. optional robot state and safety flags
Your job is to decide the robot's next single mid-level navigation action.
You may output only one of these actions:
- MOVE_FORWARD with distance in cm
- TURN_LEFT with angle in deg
- TURN_RIGHT with angle in deg
- STOP
Rules:
- Plan only the next step, not the whole route.
- If the goal has been reached, output STOP.
- If you are uncertain, the scene is unclear, or there is any safety risk, output STOP.
- MOVE_FORWARD must be 10-150 cm.
- TURN_LEFT and TURN_RIGHT must be 5-90 deg.
- Output pure JSON only, with no extra explanation.
If anything is wrong with the output, return:
{
"next_action": {
"type": "STOP"
},
"task_status": "failed",
"confidence": 0.0,
"notes": "fallback_stop"
}
references/navigation-schema.md: schema, bounds, safety fallback, examples, config contractscripts/vln_bridge.py: example OpenAI-compatible multimodal planner + Python executor bridgescripts/requirements.txt: Python dependenciesconfig/vln-config.yaml: runtime config template