computer-task-execution
Core idea
Execute the task, not the interface.
Start from the user goal and choose the path with the best combination of:
- success rate
- verifiability
- low user disruption
- low repeated trial-and-error
Do not default to local GUI automation. Prefer interfaces that are easier to verify and less intrusive.
Golden rules
- Prefer reliability over cleverness. A short visible foreground step is better than an invisible but flaky flow.
- Prefer data/interface layers over GUI simulation. If an API, scriptable interface, URL scheme, local file, browser DOM, or official automation surface exists, use that before UI driving.
- Prefer browser execution over local-app execution when the same task can be done on the web with adequate login state and verification.
- Treat verification as part of execution. A task is not done until the result is checked using the most reliable available evidence.
- Minimize focus stealing, but do not worship zero-focus-steal. The correct target is minimum user disruption with high confidence, not purity.
- Reuse proven patterns. Once a path has succeeded for a software target or domain, record it and use it first next time.
Decision model
Before acting, classify the task:
A. Read-only task
Examples:
- read meeting details
- inspect app state
- fetch notifications
- extract page content
- check whether something exists
Default preference:
- browser/web task path
- local files / local database / app-exported data
- official scripting interface
- local app UI automation
B. Write-without-send task
Examples:
- create a draft
- fill a form but do not submit yet
- edit a spreadsheet
- prepare a document
- populate a field
Default preference:
- browser/web task path
- official scripting / URL scheme / file-based write
- local app UI automation with minimum foreground time
C. High-risk action task
Examples:
- send a message
- submit a form
- delete or modify live data
- post to social media
- trigger an approval flow
Default preference:
- browser/web task path if reliable and verifiable
- official app interface if supported and verifiable
- local app UI automation with explicit pre-check and post-check
For high-risk actions, if correct targeting depends on visible UI state, expect foreground execution for the critical step.
Execution-path priority
Always try paths in this order unless the task or accumulated target knowledge strongly suggests otherwise:
-
Browser/web execution
- Best when the target service is available on the web
- Prefer DOM-visible, page-verifiable flows
- Best default for logged-in services when browser access is available
-
Official non-UI interface
- App scripting
- URL schemes
- built-in automation hooks
- import/export surfaces
- local data files or supported storage
-
Hybrid execution
- Prepare in background using data or browser
- Switch to foreground only for the minimum critical action window
- Immediately verify and exit
-
Local app UI automation
- Use only when the task cannot be completed more reliably elsewhere
- Prefer keyboard-first flows only when target focus can be guaranteed
- Use visual or state verification for completion
-
Background/local no-focus experimentation
- Last resort for low-risk tasks or explicit user request
- Treat as experimental unless already proven for this target
- If success is not strongly verifiable, do not present it as complete
Focus policy
When background/no-focus execution is appropriate
Prefer trying no-focus or low-focus execution when:
- the task is read-only
- the action operates on data rather than UI state
- the target exposes a scriptable surface
- verification does not depend on visible window state
- the user explicitly requests silent/minimal-disruption mode
When foreground execution is usually necessary
Use foreground execution for the critical step when:
- keyboard input must reach a specific target window or field
- the result depends on visible UI state
- recipient/target selection must be visually confirmed
- the task sends, submits, deletes, or approves something
- previous background attempts for this target were flaky
Preferred compromise
For local app tasks, default to:
- background preparation
- shortest possible foreground critical section
- post-action verification
- return control promptly
This is usually better than forcing a fully background flow.
Verification rules
Use the strongest available verification method:
-
target-system confirmation
- message appears in thread
- record exists
- page shows success state
- saved content is readable from source
-
direct state read-back
- reread the object after write
- check the updated field value
- reload and confirm persistence
-
visual confirmation
- screenshot or visible state inspection
- only acceptable when stronger read-back is unavailable
-
process-only confirmation
- use only for low-risk tasks
- “command ran” is not sufficient evidence for a high-risk task
If you cannot verify confidently, say so and keep the result provisional.
Pattern memory: learn per software target
After each successful or meaningfully failed execution, update target-specific experience.
If a relevant pattern already exists, reading it before execution is mandatory.
Storage
- Website/domain patterns:
references/site-patterns/<domain>.md
- Local app patterns:
references/site-patterns/<app-name>.md
What to record
Record only facts learned through execution:
- successful path
- required preconditions
- whether foreground was necessary
- whether browser was superior to app
- known unstable paths
- verification method that worked
- date of discovery
Why this matters
Next time, if the target matches a known app or domain:
- read that pattern file first
- reuse the proven path first
- skip previously disproven paths unless the environment changed
This avoids repeated “try everything again” behavior.
Pattern file format
---
kind: local-app | website
name: WeChat
domain: x.com
app_id: com.tencent.xinWeChat
aliases: [微信, WeChat]
updated: 2026-03-27
---
## Successful paths
- [2026-03-27] Foreground: open -a WeChat → Command+F → paste contact → Enter → paste message → Enter
## Preconditions
- Main window present
- Search accepts pasted contact names
## Verification
- Sent message visibly appears in the target thread
## Unstable or failed paths
- [2026-03-27] Background-only keyboard injection was not reliably targetable
## Recommended default
- Use background preparation + short foreground execution + post-send verification
How to use pattern memory
Step 1: identify the target
Normalize the target to either:
- a domain, or
- an application name
Step 2: check for prior knowledge
If a matching pattern file exists, you must read it before choosing the execution path.
Do not skip this step just because you already have a generic plan.
Step 3: start with the proven route
If the stored preferred route still fits the current request, use it first.
Treat stored successful patterns as the default starting point.
Step 4: only explore when needed
Explore alternatives only if:
- the preferred route fails
- the user requested a different mode
- the environment clearly changed
- the stored pattern is clearly inapplicable to the current task
Do not re-run previously disproven paths unless there is a specific reason.
Step 5: update after execution
If new facts were learned, update the pattern file.
Pattern memory is part of task completion, not optional cleanup.
Choosing browser vs local app
Prefer browser when
- the service has a working web app
- login state exists in browser
- DOM/state inspection improves confidence
- you need robust, repeatable verification
- avoiding frontmost app disruption matters
Prefer local app when
- the task is app-only
- the app exposes better native automation than the website
- the browser version is missing key capabilities
- the task is already known to work reliably through a stored app pattern
Local app operating style
If local app execution is necessary, prefer this sequence:
- Determine exact success criteria
- Prepare all inputs before touching the app
- Open or locate the target app/window
- Keep foreground time as short as possible
- Execute only the critical path
- Verify immediately
- Record the winning pattern
Handling silent mode requests
If the user asks to avoid stealing focus:
- first see whether browser or non-UI paths can satisfy the task
- if local-app background execution is only partially reliable, say so internally in planning and choose it only when the task risk is low or the user explicitly prefers silence over certainty
- for high-risk tasks, recommend minimum-foreground execution rather than pretending a background path is equally safe
Failure handling
When a path fails:
- identify whether the failure was due to targeting, focus, auth, UI drift, missing permissions, or bad path choice
- switch to the next-best path class instead of repeating the same failing method blindly
- if the failure teaches something reusable, record it in the target pattern file
References
- Read
references/pattern-memory.md for the pattern-memory policy.
- Read
references/site-patterns/<target>.md when a known software target or domain already has stored experience.