Midscene Automations Skills for iOS

ReviewAudited by ClawScan on May 10, 2026.

Overview

This is a coherent iOS automation skill, but it grants broad device-control authority and uses external model/provider tooling without clear safety boundaries for destructive or sensitive actions.

Install only if you intentionally want an agent to control an iOS device. Prefer test devices, exact-pinned trusted CLI versions, limited model API keys, and explicit human confirmation before deleting data, sending messages, making calls, purchasing, changing settings, or handling sensitive screens.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernMedium Confidence

ASI02: Tool Misuse and Exploitation

What this means

If invoked on a real device, the agent could perform irreversible or account-affecting actions through the UI if the task prompt is broad or mistaken.

Why it was flagged

The skill can perform broad, state-changing iOS UI actions and even includes a destructive confirmation example. The provided instructions do not show a separate approval requirement or scope limits for actions that could delete data, change settings, send messages, or affect accounts.

Skill content

Use `act` to interact with the device... It autonomously handles all UI interactions internally — tapping, typing, scrolling, swiping, waiting, and navigating... `npx -y @midscene/ios@1 act --prompt "tap Delete, then confirm in the alert dialog"`

Recommendation

Use only on test devices or with narrowly scoped prompts, and require explicit confirmation before actions such as deleting data, purchasing, sending, calling, changing settings, or authenticating.

ConcernMedium Confidence

ASI02: Tool Misuse and Exploitation

What this means

A broad lower-level device-control interface can bypass the safer visible-UI workflow and may enable unexpected device operations if used with risky endpoints.

Why it was flagged

The skill exposes a lower-level WebDriverAgent request path in addition to normal visible UI automation. The example is read-only, but the instruction describes lower-level device control without showing a bounded endpoint allowlist or approval model.

Skill content

Use this when the task needs lower-level device control instead of a normal visible UI interaction: `npx -y @midscene/ios@1 runwdarequest --method GET --endpoint /wda/screen`

Recommendation

Limit direct WebDriverAgent requests to known read-only or test-approved endpoints, and ask for user confirmation before any request that can change device state.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

The actual automation code comes from the npm package at use time, so its behavior is outside this instruction-only artifact review.

Why it was flagged

The skill relies on runtime execution of an npm package that is not included in the reviewed artifacts. This is purpose-aligned, but users must trust the external package and its resolved version.

Skill content

Automate iOS devices using `npx -y @midscene/ios@1`.

Recommendation

Use a trusted package source, consider exact-pinning the CLI version, and review or lock dependencies before using it on sensitive devices.

NoteHigh Confidence

ASI03: Identity and Privilege Abuse

What this means

Users may need to provide a paid or privileged model API key, which could incur costs or expose account access if mishandled.

Why it was flagged

The skill requires a model provider API key and endpoint configuration. This is expected for Midscene, but the registry metadata lists no required environment variables or primary credential.

Skill content

MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."

Recommendation

Use a limited-scope provider key where possible, keep `.env` files private, and verify provider billing and data-use settings.

NoteMedium Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Screenshots can contain private messages, account details, financial information, or app data that may be processed by the configured model provider.

Why it was flagged

The workflow depends on screenshots and an external model endpoint. This is aligned with the skill purpose, but the provided text does not spell out privacy, retention, or redaction boundaries for screenshots sent to the configured provider.

Skill content

Operates entirely from screenshots... Midscene requires models with strong visual grounding capabilities... MIDSCENE_MODEL_BASE_URL="https://..."

Recommendation

Avoid using this on screens containing sensitive information unless you trust the configured provider and understand its data-retention policy.