Midscene Automations Skills for iOS

PassAudited by VirusTotal on May 12, 2026.

Overview

Type: OpenClaw Skill Name: midscene-ios-automation Version: 1.0.5 The skill provides iOS automation using the Midscene framework, which involves high-risk capabilities such as executing code via `npx` and utilizing `Bash` for device interaction. It requires sensitive environment variables for AI model API keys and contains a potential shell injection vulnerability in `SKILL.md`, where natural language prompts are passed directly to CLI arguments (e.g., `act --prompt`) without explicit sanitization. While these features are aligned with the stated purpose of mobile testing, the inherent risks of remote package execution and command construction meet the threshold for a suspicious classification.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If invoked on a real device, the agent could perform irreversible or account-affecting actions through the UI if the task prompt is broad or mistaken.

Why it was flagged

The skill can perform broad, state-changing iOS UI actions and even includes a destructive confirmation example. The provided instructions do not show a separate approval requirement or scope limits for actions that could delete data, change settings, send messages, or affect accounts.

Skill content
Use `act` to interact with the device... It autonomously handles all UI interactions internally — tapping, typing, scrolling, swiping, waiting, and navigating... `npx -y @midscene/ios@1 act --prompt "tap Delete, then confirm in the alert dialog"`
Recommendation

Use only on test devices or with narrowly scoped prompts, and require explicit confirmation before actions such as deleting data, purchasing, sending, calling, changing settings, or authenticating.

What this means

A broad lower-level device-control interface can bypass the safer visible-UI workflow and may enable unexpected device operations if used with risky endpoints.

Why it was flagged

The skill exposes a lower-level WebDriverAgent request path in addition to normal visible UI automation. The example is read-only, but the instruction describes lower-level device control without showing a bounded endpoint allowlist or approval model.

Skill content
Use this when the task needs lower-level device control instead of a normal visible UI interaction: `npx -y @midscene/ios@1 runwdarequest --method GET --endpoint /wda/screen`
Recommendation

Limit direct WebDriverAgent requests to known read-only or test-approved endpoints, and ask for user confirmation before any request that can change device state.

What this means

The actual automation code comes from the npm package at use time, so its behavior is outside this instruction-only artifact review.

Why it was flagged

The skill relies on runtime execution of an npm package that is not included in the reviewed artifacts. This is purpose-aligned, but users must trust the external package and its resolved version.

Skill content
Automate iOS devices using `npx -y @midscene/ios@1`.
Recommendation

Use a trusted package source, consider exact-pinning the CLI version, and review or lock dependencies before using it on sensitive devices.

What this means

Users may need to provide a paid or privileged model API key, which could incur costs or expose account access if mishandled.

Why it was flagged

The skill requires a model provider API key and endpoint configuration. This is expected for Midscene, but the registry metadata lists no required environment variables or primary credential.

Skill content
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
Recommendation

Use a limited-scope provider key where possible, keep `.env` files private, and verify provider billing and data-use settings.

What this means

Screenshots can contain private messages, account details, financial information, or app data that may be processed by the configured model provider.

Why it was flagged

The workflow depends on screenshots and an external model endpoint. This is aligned with the skill purpose, but the provided text does not spell out privacy, retention, or redaction boundaries for screenshots sent to the configured provider.

Skill content
Operates entirely from screenshots... Midscene requires models with strong visual grounding capabilities... MIDSCENE_MODEL_BASE_URL="https://..."
Recommendation

Avoid using this on screens containing sensitive information unless you trust the configured provider and understand its data-retention policy.