Computer Task Execution

Execute multi-step user tasks on websites or local apps using the most reliable, minimally intrusive method with verification and learned patterns.

gift-is-coding@gift-is-coding

Install

openclaw skills install @gift-is-coding/computer-task-execution

computer-task-execution

Core idea

Execute the task, not the interface.

Start from the user goal and choose the path with the best combination of:

success rate
verifiability
low user disruption
low repeated trial-and-error

Do not default to local GUI automation. Prefer interfaces that are easier to verify and less intrusive.

Golden rules

Prefer reliability over cleverness. A short visible foreground step is better than an invisible but flaky flow.
Prefer data/interface layers over GUI simulation. If an API, scriptable interface, URL scheme, local file, browser DOM, or official automation surface exists, use that before UI driving.
Prefer browser execution over local-app execution when the same task can be done on the web with adequate login state and verification.
Treat verification as part of execution. A task is not done until the result is checked using the most reliable available evidence.
Minimize focus stealing, but do not worship zero-focus-steal. The correct target is minimum user disruption with high confidence, not purity.
Reuse proven patterns. Once a path has succeeded for a software target or domain, record it and use it first next time.

Decision model

Before acting, classify the task:

A. Read-only task

Examples:

read meeting details
inspect app state
fetch notifications
extract page content
check whether something exists

Default preference:

browser/web task path
local files / local database / app-exported data
official scripting interface
local app UI automation

B. Write-without-send task

Examples:

create a draft
fill a form but do not submit yet
edit a spreadsheet
prepare a document
populate a field

Default preference:

browser/web task path
official scripting / URL scheme / file-based write
local app UI automation with minimum foreground time

C. High-risk action task

Examples:

send a message
submit a form
delete or modify live data
post to social media
trigger an approval flow

Default preference:

browser/web task path if reliable and verifiable
official app interface if supported and verifiable
local app UI automation with explicit pre-check and post-check

For high-risk actions, if correct targeting depends on visible UI state, expect foreground execution for the critical step.

Execution-path priority

Always try paths in this order unless the task or accumulated target knowledge strongly suggests otherwise:

Browser/web execution
- Best when the target service is available on the web
- Prefer DOM-visible, page-verifiable flows
- Best default for logged-in services when browser access is available
Official non-UI interface
- App scripting
- URL schemes
- built-in automation hooks
- import/export surfaces
- local data files or supported storage
Hybrid execution
- Prepare in background using data or browser
- Switch to foreground only for the minimum critical action window
- Immediately verify and exit
Local app UI automation
- Use only when the task cannot be completed more reliably elsewhere
- Prefer keyboard-first flows only when target focus can be guaranteed
- Use visual or state verification for completion
Background/local no-focus experimentation
- Last resort for low-risk tasks or explicit user request
- Treat as experimental unless already proven for this target
- If success is not strongly verifiable, do not present it as complete

Focus policy

When background/no-focus execution is appropriate

Prefer trying no-focus or low-focus execution when:

the task is read-only
the action operates on data rather than UI state
the target exposes a scriptable surface
verification does not depend on visible window state
the user explicitly requests silent/minimal-disruption mode

When foreground execution is usually necessary

Use foreground execution for the critical step when:

keyboard input must reach a specific target window or field
the result depends on visible UI state
recipient/target selection must be visually confirmed
the task sends, submits, deletes, or approves something
previous background attempts for this target were flaky

Preferred compromise

For local app tasks, default to:

background preparation
shortest possible foreground critical section
post-action verification
return control promptly

This is usually better than forcing a fully background flow.

Verification rules

Use the strongest available verification method:

target-system confirmation
- message appears in thread
- record exists
- page shows success state
- saved content is readable from source
direct state read-back
- reread the object after write
- check the updated field value
- reload and confirm persistence
visual confirmation
- screenshot or visible state inspection
- only acceptable when stronger read-back is unavailable
process-only confirmation
- use only for low-risk tasks
- “command ran” is not sufficient evidence for a high-risk task

If you cannot verify confidently, say so and keep the result provisional.

Pattern memory: learn per software target

After each successful or meaningfully failed execution, update target-specific experience. If a relevant pattern already exists, reading it before execution is mandatory.

Storage

Website/domain patterns: references/site-patterns/<domain>.md
Local app patterns: references/site-patterns/<app-name>.md

What to record

Record only facts learned through execution:

successful path
required preconditions
whether foreground was necessary
whether browser was superior to app
known unstable paths
verification method that worked
date of discovery

Why this matters

Next time, if the target matches a known app or domain:

read that pattern file first
reuse the proven path first
skip previously disproven paths unless the environment changed

This avoids repeated “try everything again” behavior.

Pattern file format

markdown

---
kind: local-app | website
name: WeChat
domain: x.com
app_id: com.tencent.xinWeChat
aliases: [微信, WeChat]
updated: 2026-03-27
---

## Successful paths
- [2026-03-27] Foreground: open -a WeChat → Command+F → paste contact → Enter → paste message → Enter

## Preconditions
- Main window present
- Search accepts pasted contact names

## Verification
- Sent message visibly appears in the target thread

## Unstable or failed paths
- [2026-03-27] Background-only keyboard injection was not reliably targetable

## Recommended default
- Use background preparation + short foreground execution + post-send verification

How to use pattern memory

Step 1: identify the target

Normalize the target to either:

a domain, or
an application name

Step 2: check for prior knowledge

If a matching pattern file exists, you must read it before choosing the execution path. Do not skip this step just because you already have a generic plan.

Step 3: start with the proven route

If the stored preferred route still fits the current request, use it first. Treat stored successful patterns as the default starting point.

Step 4: only explore when needed

Explore alternatives only if:

the preferred route fails
the user requested a different mode
the environment clearly changed
the stored pattern is clearly inapplicable to the current task

Do not re-run previously disproven paths unless there is a specific reason.

Step 5: update after execution

If new facts were learned, update the pattern file. Pattern memory is part of task completion, not optional cleanup.

Choosing browser vs local app

Prefer browser when

the service has a working web app
login state exists in browser
DOM/state inspection improves confidence
you need robust, repeatable verification
avoiding frontmost app disruption matters

Prefer local app when

the task is app-only
the app exposes better native automation than the website
the browser version is missing key capabilities
the task is already known to work reliably through a stored app pattern

Local app operating style

If local app execution is necessary, prefer this sequence:

Determine exact success criteria
Prepare all inputs before touching the app
Open or locate the target app/window
Keep foreground time as short as possible
Execute only the critical path
Verify immediately
Record the winning pattern

Handling silent mode requests

If the user asks to avoid stealing focus:

first see whether browser or non-UI paths can satisfy the task
if local-app background execution is only partially reliable, say so internally in planning and choose it only when the task risk is low or the user explicitly prefers silence over certainty
for high-risk tasks, recommend minimum-foreground execution rather than pretending a background path is equally safe

Failure handling

When a path fails:

identify whether the failure was due to targeting, focus, auth, UI drift, missing permissions, or bad path choice
switch to the next-best path class instead of repeating the same failing method blindly
if the failure teaches something reusable, record it in the target pattern file

References

Read references/pattern-memory.md for the pattern-memory policy.
Read references/site-patterns/<target>.md when a known software target or domain already has stored experience.

Computer Task Execution

Install

computer-task-execution

Core idea

Golden rules

Decision model

A. Read-only task

B. Write-without-send task

C. High-risk action task

Execution-path priority

Focus policy

When background/no-focus execution is appropriate

When foreground execution is usually necessary

Preferred compromise

Verification rules

Pattern memory: learn per software target

Storage

What to record

Why this matters

Pattern file format

How to use pattern memory

Step 1: identify the target

Step 2: check for prior knowledge

Step 3: start with the proven route

Step 4: only explore when needed

Step 5: update after execution

Choosing browser vs local app

Prefer browser when

Prefer local app when

Local app operating style

Handling silent mode requests

Failure handling

References

Related skills