Midscene Automations Skills for Android

Vision-driven Android device automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 1.2k · 5 current installs · 5 all-time installs

byLeyang@quanru

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

The SKILL.md describes vision-driven Android automation via Midscene and ADB which is internally coherent for the stated purpose. However the registry metadata claims no required binaries or env vars, while the instructions clearly require Node (npx @midscene/android@1), ADB usage (adb shell ...), and model credentials. The omitted declarations in the metadata are a mismatch that reduces transparency and is unexpected for this capability.

Instruction Scope

Instructions direct the agent to run npx commands, take screenshots, read saved image files, and supply model configuration (MIDSCENE_MODEL_*) including a BASE_URL. That implies screenshots and device UI content will be sent to remote model endpoints or Midscene services. Exfiltration of potentially sensitive screen contents to external providers is not called out in the registry metadata and is material to risk. The instructions also advise using ADB (powerful device control), which is consistent with purpose but increases the threat surface.

ℹ

Install Mechanism

There is no install spec in the registry (instruction-only), which is lower friction. However the runtime uses npx to fetch @midscene/android at invocation time — this will download and run code from npm dynamically. The metadata did not list Node/npm as a required binary. Dynamically pulling code at runtime is normal for npx but worth noting because it executes third-party code on demand.

Credentials

The SKILL.md requires multiple environment variables (MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, MIDSCENE_MODEL_FAMILY, etc.) and suggests provider-specific keys (Google, Alibaba, OpenRouter, Doubao). These are appropriate for remote-model driven automation, but the skill registry declared 'none' for required env vars/primary credential. In addition, placing keys in a .env file (as recommended) means the tool will read local secret files; that access is not declared in metadata and could expose unrelated secrets if present.

✓

Persistence & Privilege

The skill is instruction-only, has no install spec, always:false, and does not request to modify other skills or system-wide settings. It does require ADB access at runtime but does not request forced persistent inclusion or elevated platform privileges.

What to consider before installing

What to consider before installing/using this skill: - Metadata mismatch: The registry claims no required binaries or environment variables, but the SKILL.md requires Node (npx), ADB, and multiple model API keys/BASE_URLs. Ask the publisher to correct the metadata before trusting the skill. - Sensitive data exposure: The workflow captures screenshots of your Android device and (by design) sends them to a model endpoint or Midscene service configured by MIDSCENE_MODEL_BASE_URL. Those screenshots can contain passwords, 2FA codes, messages, or other sensitive data. Only use with providers and endpoints whose privacy/security policies you trust. - Dynamic code execution: npx will fetch and run @midscene/android from npm at runtime. If you want to proceed, inspect the package source (or run in an isolated environment) to verify behavior. - Secrets handling: The skill suggests storing API keys in a .env file which Midscene will load. Ensure your .env contains only the intended keys and is not shared. Prefer provider-scoped API keys with minimal privileges and short lifetimes when possible. - Test safely: If you must use the skill, test on an emulator or a disposable device to avoid leaking personal data. Monitor network traffic and limit which model endpoints you configure. - Ask for provenance: There is no homepage or source listed. Prefer skills with a verifiable publisher, source repository, and documentation. If you cannot verify origin, exercise caution. If you want help: I can extract the exact env vars and commands the SKILL.md requires, suggest safer configuration choices (e.g., local/private model endpoints, scoped API keys), or draft questions to ask the publisher to clarify metadata and data handling.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.2

Download zip

latestvk971fmqb2t7y1j0k5cd0whjykh82e9jt

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Android Device Automation

CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:

Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.

Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.

Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex act commands may need even longer.

Always report task results before finishing. After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction.

Automate Android devices using npx @midscene/android@1. Each CLI command maps directly to an MCP tool — you (the AI agent) act as the brain, deciding which actions to take based on screenshots.

Prerequisites

Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a .env file in the current working directory (Midscene loads .env automatically):

MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"

Example: Gemini (Gemini-3-Flash)

MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"

Example: Qwen 3.5

MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"
# If using OpenRouter, set:
# MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
# MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"
# MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"

Example: Doubao Seed 2.0 Lite

MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"

Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash.

If the model is not configured, ask the user to set it up. See Model Configuration for supported providers.

Commands

Connect to Device

npx @midscene/android@1 connect
npx @midscene/android@1 connect --deviceId emulator-5554

Take Screenshot

npx @midscene/android@1 take_screenshot

After taking a screenshot, read the saved image file to understand the current screen state before deciding the next action.

Perform Action

Use act to interact with the device and get the result. It autonomously handles all UI interactions internally — tapping, typing, scrolling, swiping, waiting, and navigating — so you should give it complex, high-level tasks as a whole rather than breaking them into small steps. Describe what you want to do and the desired effect in natural language:

# specific instructions
npx @midscene/android@1 act --prompt "type hello world in the search field and press Enter"
npx @midscene/android@1 act --prompt "long press the message bubble and tap Delete in the popup menu"

# or target-driven instructions
npx @midscene/android@1 act --prompt "open Settings and navigate to Wi-Fi settings, tell me the connected network name"

Disconnect

npx @midscene/android@1 disconnect

Workflow Pattern

Since CLI commands are stateless between invocations, follow this pattern:

Connect to establish a session
Launch the target app and take screenshot to see the current state, make sure the app is launched and visible on the screen.
Execute action using act to perform the desired action or target-driven instructions.
Disconnect when done
Report results — summarize what was accomplished, present key findings and data extracted during the task, and list any generated files (screenshots, logs, etc.) with their paths

Best Practices

Bring the target app to the foreground before using this skill: For best efficiency, launch the app using ADB (e.g., adb shell am start -n <package/activity>) before invoking any midscene commands. Then take a screenshot to confirm the app is actually in the foreground. Only after visual confirmation should you proceed with UI automation using this skill. ADB commands are significantly faster than using midscene to navigate to and open apps.
Be specific about UI elements: Instead of vague descriptions, provide clear, specific details. Say "the Wi-Fi toggle switch on the right side" instead of "the toggle".
Describe locations when possible: Help target elements by describing their position (e.g., "the search icon at the top right", "the third item in the list").
Never run in background: Every midscene command must run synchronously — background execution breaks the screenshot-analyze-act loop.
Batch related operations into a single act command: When performing consecutive operations within the same app, combine them into one act prompt instead of splitting them into separate commands. For example, "open Settings, tap Wi-Fi, and toggle it on" should be a single act call, not three. This reduces round-trips, avoids unnecessary screenshot-analyze cycles, and is significantly faster.
Always report results after completion: After finishing the automation task, you MUST proactively present the results to the user without waiting for them to ask. This includes: (1) the answer to the user's original question or the outcome of the requested task, (2) key data extracted or observed during execution, (3) screenshots and other generated files with their paths, (4) a brief summary of steps taken. Do NOT silently finish after the last automation command — the user expects complete results in a single interaction.

Example — Popup menu interaction:

npx @midscene/android@1 act --prompt "long press the message bubble and tap Delete in the popup menu"
npx @midscene/android@1 take_screenshot

Example — Form interaction:

npx @midscene/android@1 act --prompt "fill in the username field with 'testuser' and the password field with 'pass123', then tap the Login button"
npx @midscene/android@1 take_screenshot

Troubleshooting

Problem	Solution
ADB not found	Install Android SDK Platform Tools: `brew install android-platform-tools` (macOS) or download from developer.android.com.
Device not listed	Check USB connection, ensure USB debugging is enabled in Developer Options, and run `adb devices`.
Device shows "unauthorized"	Unlock the device and accept the USB debugging authorization prompt. Then run `adb devices` again.
Device shows "offline"	Disconnect and reconnect the USB cable. Run `adb kill-server && adb start-server`.
Command timeout	The device screen may be off or locked. Wake the device with `adb shell input keyevent KEYCODE_WAKEUP` and unlock it.
API key error	Check `.env` file contains `MIDSCENE_MODEL_API_KEY=<your-key>`. See Model Configuration.
Wrong device targeted	If multiple devices are connected, use `--deviceId <id>` flag with the `connect` command.

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…