Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

mobizen-gui

v1.0.0

Helps users set up and run MobiZen-GUI to perform mobile-use tasks — automating Android phone operations via natural language. Use when the user wants to con...

4· 51·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The SKILL.md content matches the stated purpose (setting up and running MobiZen-GUI to automate Android devices). Recommended actions (adb, install ADBKeyboard, clone GitHub repo, pip install requirements, supply model API info) are expected. Minor metadata mismatch: registry lists no required credentials or env vars, but the runtime instructions explicitly require an API key/base_url/model_name to send prompts to a model endpoint (stored in my_config.yaml).
!
Instruction Scope
Runtime instructions direct taking screenshots, building prompts that include screenshots, and sending them to whatever model endpoint you configure (OpenAI or third-party). They also instruct the agent to ask the user for an API key and write it into my_config.yaml. This is within the skill's purpose but has clear privacy/exfiltration implications: device screenshots and actions may be transmitted to external services not under your control. The docs also include development-level steps (custom message builders, model client classes) that reference transforming images into data URLs — again implying sending image data to a model.
!
Install Mechanism
There is no formal install spec in the registry (instruction-only), but SKILL.md tells users to pip install packages, clone a GitHub repo, download models from HuggingFace/ModelScope, and run vLLM with --trust-remote-code. Downloading third-party models and using --trust-remote-code can execute arbitrary code from model packages; model downloads from non-official mirrors or ModelScope increase risk. These are reasonable for running local models but represent a higher-risk install/runtime surface that the user must review/trust.
!
Credentials
The skill asks the user to provide model credentials (api_key, base_url, model_name) and to store them in a config file, but the registry metadata declared no required credentials. Requesting an API key is proportionate to the feature (remote model calls), but it increases risk because captured screenshots and phone interactions will be sent to that endpoint. The guidance suggests various providers (OpenAI, third-party, or local Ollama) — choose carefully and avoid giving long-lived/high-privilege keys to untrusted endpoints.
Persistence & Privilege
The skill is not always-enabled and does not request system-wide privileges. It writes/uses a local config file (my_config.yaml) and works with adb devices — expected for mobile automation. There is no attempt to modify other skills or system-wide agent settings in the provided instructions.
What to consider before installing
This skill appears to be what it claims (a MobiZen-GUI setup guide) but contains several privacy and code-execution risks you should consider before proceeding: - Be careful sharing API keys: the agent expects an api_key/base_url/model_name and will send screenshots and prompts to whichever endpoint you provide. Use a limited-scope or throwaway key, or prefer a local model (Ollama) if you must avoid cloud exposure. - Screenshots and device content may contain sensitive data (messages, passwords, notifications). Assume they could be transmitted off-device; disable or mask sensitive apps before testing. - The guide instructs downloading third-party models and running vLLM with --trust-remote-code. That can execute arbitrary code from model packages — only download and run models from sources you trust, and inspect code when possible. - There is no registry-declared credential field — the config-based API key requirement is only in SKILL.md. Treat that as a manual consent point: do not paste high-privilege keys into the config without understanding where requests go. - Review the cloned repository and requirements.txt before running pip install. Consider running in an isolated environment (VM/container) and revoke keys when finished. If you want, I can: (a) highlight exactly where screenshots are created/sent in the MobiZen-GUI repo, (b) produce safe example config values for using a local model, or (c) draft a short checklist to minimize exposure when testing this tool.

Like a lobster shell, security has layers — review code before you run it.

latestvk970pxqd9nf358vnnhc43eg86583e2qy

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

MobiZen-GUI

VLM-based mobile automation framework — control Android devices via natural language.

Repo: https://github.com/alibaba/MobiZen-GUI


1. Environment Setup

1.1 Install ADB

# macOS
brew install android-platform-tools
# Linux
sudo apt-get install android-tools-adb
# Windows: download from https://developer.android.com/studio/releases/platform-tools
adb version  # verify

1.2 Connect Device & Install ADBKeyboard

adb devices                    # USB; or: adb tcpip 5555 && adb connect <ip>:5555
adb install ADBKeyboard.apk    # download from https://github.com/senzhk/ADBKeyBoard

Then on device: Settings → System → Languages & Input → Virtual Keyboard → Enable ADBKeyboard.

1.3 Install Project

git clone https://github.com/alibaba/MobiZen-GUI.git && cd MobiZen-GUI
pip install -r requirements.txt   # openai, pillow, pyyaml

2. Quick Start (Config Only, No Code Changes)

Copy example config:

cp config_example.yaml my_config.yaml

Only 3 fields need to be configured — api_key, base_url, model_name:

api_key: "your-api-key-here"
base_url: "https://api.openai.com/v1"   # your model endpoint
model_name: "gpt-4o"                    # model identifier

How to set these 3 fields: When the user asks to run a phone task but hasn't configured yet, the AI should ask the user to provide api_key, base_url, and model_name, then write them into my_config.yaml. The user can also manually edit the file. Any OpenAI-compatible API works.

Provider examples:

# OpenAI
base_url: "https://api.openai.com/v1"
api_key: "sk-..."
model_name: "gpt-4o"

# DeepSeek / Moonshot / Zhipu AI etc.
base_url: "https://api.deepseek.com/v1"
api_key: "your-key"
model_name: "deepseek-chat"

# Ollama (local)
base_url: "http://localhost:11434/v1"
api_key: "dummy"
model_name: "llava"

Run:

python main.py --config my_config.yaml --instruction "打开微信并发送消息"

3. Configuration Reference

FieldDefaultDescription
device_idnull (auto)ADB device; null = first available
api_key""Model API key
base_urlnullModel API endpoint
model_name"gpt-4o"Model identifier
model_type"qwen3vl"Coordinate system (999x999 virtual space)
max_steps25Max execution steps
step_delay2.0Delay between steps (seconds)
first_step_delay4.0Delay after first step
temperature0.1Sampling temperature
top_p0.001Top-p sampling
max_tokens1024Max output tokens
timeout60Request timeout (seconds)
use_adbkeyboardtrueChinese text input via ADBKeyboard
screenshot_dir"./screenshots"Screenshot save directory

4. Advanced: Deploy MobiZen-GUI-4B Locally

For best results on Chinese mobile tasks, deploy the dedicated 4B model.

4.1 Download Model

pip install -U huggingface_hub
# China mirror (optional)
export HF_ENDPOINT=https://hf-mirror.com
hf download alibabagroup/MobiZen-GUI-4B --local-dir ./MobiZen-GUI-4B

Alternatively from ModelScope: https://modelscope.cn/models/GUIAgent/MobiZen-GUI-4B

4.2 Serve with vLLM

pip install vllm==0.11.0
vllm serve ./MobiZen-GUI-4B --host 0.0.0.0 --port 8000 --trust-remote-code

4.3 Point Config to Local Model

api_key: "dummy"
base_url: "http://localhost:8000/v1"
model_name: "MobiZen-GUI-4B"
model_type: "qwen3vl"

Then run as usual: python main.py --config my_config.yaml --instruction "..."


5. Customization (Requires Code Changes)

The framework uses a plugin architecture — three components can be swapped via config class paths:

ComponentRoleBase ClassDefault Implementation
MessageBuilderBuilds prompt + screenshot for modelcore.message_builders.base.BaseMessageBuildercore.message_builders.qwen.QwenMessageBuilder
ModelClientCalls the model APIcore.model_clients.base.BaseModelClientcore.model_clients.openai.OpenAIClient
ResponseParserParses model output → actioncore.response_parsers.base.BaseResponseParsercore.response_parsers.qwen.QwenResponseParser

5.1 Custom Model Client

For non-OpenAI-compatible APIs:

# core/model_clients/my_client.py
from .base import BaseModelClient

class MyClient(BaseModelClient):
    def __init__(self, api_key: str, base_url: str = None, model: str = "", timeout: int = 60):
        pass  # init your client

    def chat(self, messages, **kwargs):
        pass  # must return obj with .choices[0].message.content

Config:

model_client_class: "core.model_clients.my_client.MyClient"
model_client_kwargs: {}  # extra kwargs passed to __init__

5.2 Custom Message Builder

To change the system prompt or how screenshots/history are formatted:

# core/message_builders/my_builder.py
from .base import BaseMessageBuilder
from utils.image import image_to_data_url

class MyBuilder(BaseMessageBuilder):
    def build_system_prompt(self, **kwargs) -> str:
        return "your system prompt"

    def build_messages(self, instruction, current_screenshot, history, **kwargs):
        return [{"role": "system", "content": [...]}, {"role": "user", "content": [...]}]

Config:

message_builder_class: "core.message_builders.my_builder.MyBuilder"

5.3 Custom Response Parser

To parse a different model output format:

# core/response_parsers/my_parser.py
from .base import BaseResponseParser, ParsedResponse

class MyParser(BaseResponseParser):
    def parse(self, response) -> ParsedResponse:
        content = response.choices[0].message.content
        # parse content into structured fields
        return ParsedResponse(
            thought="...",
            summary="...",
            action={"arguments": {"action": "click", "coordinate": [x, y]}},
            subtask="..."
        )

Action dict format: {"arguments": {"action": "<type>", ...}} — supported types: click, long_press, swipe, type, system_button, wait, terminate.

Config:

response_parser_class: "core.response_parsers.my_parser.MyParser"

5.4 Add New Action Type

  1. Add _execute_<action>(self, args) method in core/executor/action_executor.py
  2. Add dispatch branch in ActionExecutor.execute()
  3. Update system prompt in QwenMessageBuilder.build_system_prompt()

6. Troubleshooting

  • Device not found: Run adb devices — check USB/wireless connection
  • ADBKeyboard not working: Ensure enabled in device settings; test: adb shell am broadcast -a ADB_INPUT_TEXT --es msg "test"
  • Model connection error: Verify base_url + api_key; check network
  • Coordinate mismatch: Ensure model_type matches your model; check screen size: adb shell wm size
  • Duplicate action loop: Agent auto-stops after 5 identical actions; may indicate model confusion

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…