Midscene Automations Skills for iOS
PassAudited by VirusTotal on May 12, 2026.
Overview
Type: OpenClaw Skill Name: midscene-ios-automation Version: 1.0.5 The skill provides iOS automation using the Midscene framework, which involves high-risk capabilities such as executing code via `npx` and utilizing `Bash` for device interaction. It requires sensitive environment variables for AI model API keys and contains a potential shell injection vulnerability in `SKILL.md`, where natural language prompts are passed directly to CLI arguments (e.g., `act --prompt`) without explicit sanitization. While these features are aligned with the stated purpose of mobile testing, the inherent risks of remote package execution and command construction meet the threshold for a suspicious classification.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If invoked on a real device, the agent could perform irreversible or account-affecting actions through the UI if the task prompt is broad or mistaken.
The skill can perform broad, state-changing iOS UI actions and even includes a destructive confirmation example. The provided instructions do not show a separate approval requirement or scope limits for actions that could delete data, change settings, send messages, or affect accounts.
Use `act` to interact with the device... It autonomously handles all UI interactions internally — tapping, typing, scrolling, swiping, waiting, and navigating... `npx -y @midscene/ios@1 act --prompt "tap Delete, then confirm in the alert dialog"`
Use only on test devices or with narrowly scoped prompts, and require explicit confirmation before actions such as deleting data, purchasing, sending, calling, changing settings, or authenticating.
A broad lower-level device-control interface can bypass the safer visible-UI workflow and may enable unexpected device operations if used with risky endpoints.
The skill exposes a lower-level WebDriverAgent request path in addition to normal visible UI automation. The example is read-only, but the instruction describes lower-level device control without showing a bounded endpoint allowlist or approval model.
Use this when the task needs lower-level device control instead of a normal visible UI interaction: `npx -y @midscene/ios@1 runwdarequest --method GET --endpoint /wda/screen`
Limit direct WebDriverAgent requests to known read-only or test-approved endpoints, and ask for user confirmation before any request that can change device state.
The actual automation code comes from the npm package at use time, so its behavior is outside this instruction-only artifact review.
The skill relies on runtime execution of an npm package that is not included in the reviewed artifacts. This is purpose-aligned, but users must trust the external package and its resolved version.
Automate iOS devices using `npx -y @midscene/ios@1`.
Use a trusted package source, consider exact-pinning the CLI version, and review or lock dependencies before using it on sensitive devices.
Users may need to provide a paid or privileged model API key, which could incur costs or expose account access if mishandled.
The skill requires a model provider API key and endpoint configuration. This is expected for Midscene, but the registry metadata lists no required environment variables or primary credential.
MIDSCENE_MODEL_API_KEY="your-api-key" MIDSCENE_MODEL_NAME="model-name" MIDSCENE_MODEL_BASE_URL="https://..."
Use a limited-scope provider key where possible, keep `.env` files private, and verify provider billing and data-use settings.
Screenshots can contain private messages, account details, financial information, or app data that may be processed by the configured model provider.
The workflow depends on screenshots and an external model endpoint. This is aligned with the skill purpose, but the provided text does not spell out privacy, retention, or redaction boundaries for screenshots sent to the configured provider.
Operates entirely from screenshots... Midscene requires models with strong visual grounding capabilities... MIDSCENE_MODEL_BASE_URL="https://..."
Avoid using this on screens containing sensitive information unless you trust the configured provider and understand its data-retention policy.
