{"skill":{"slug":"model-deploy","displayName":"Model Deploy Skill","summary":"Use this skill when users request to deploy LLMs (Qwen, DeepSeek, etc.) on specified GPU servers and start the model service. This skill can Download models...","description":"---\nname: model-deploy\ndescription: Use this skill when users request to deploy LLMs (Qwen, DeepSeek, etc.) on specified GPU servers and start the model service. This skill can Download models using ModelScope; Start the vLLM inference service.\n---\n\n# Model Deploy\nDeploy large language models on GPU servers using vLLM. **NOTE: only ModelScope plateform and vLLM inference engine is supported currently.**\n\nPlease ensure that the server where your OpenClaw is located has passwordless login access to the GPU servers. You can achieve this using `ssh-copy-id` command on your OpenClaw server.\n\nThis skill assumes that Miniconda is already installed on your server and is used to manage Python environments. You can use the following command to create the vllm environment with Miniconda:\n\n```bash\nconda create -n vllm python=3.10 -y\nconda activate vllm\npip install vllm\n```\n\n## Quick Start\nOn the ModelScope platform, models are uniquely identified by `<MODEL_ORG>/<MODEL_NAME>`. For example, for `Qwen/Qwen3.5-0.8B`, `MODEL_ORG` is Qwen and `MODEL_NAME` is Qwen3.5-0.8B.\n\n### Deploying Qwen Family Models\nTo deploy Qwen-Family models, use the deployment script `scripts/deploy.sh`. The usage of the script is as follows:\n\n```bash\nUsage: [ENV_VARS] deploy.sh <model_name>\n\nExample:\n  PORT=8001 \\\n  GPU_COUNT=4 \\\n  ./deploy.sh Qwen3.5-0.8B\n\nEnvironment Variables:\n  ENV_NAME        conda environment name (default: vllm)\n  PORT            service port (default: 8000)\n  GPU_COUNT       number of GPUs for tensor parallelism (default: 1)\n  PROXY           proxy address (default: http://{proxyaddress}:{port})\n  MODEL_BASE_PATH local path to store models (default: /home/work/models)\n```\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| MODEL_ORG | model organization | Qwen |\n| MODEL_NAME | model name | Qwen3.5-0.8B |\n| ENV_NAME | conda environment | vllm |\n| PORT | model service port | 8000 |\n| GPU_COUNT | number of GPUs for tensor parallelism | 1 |\n| PROXY | proxy address | http://{proxyaddress}:{port} |\n| MODEL_BASE_PATH | local storage path for models | /home/work/models |\n\n## Deployment Steps\n- Extract required information from the user request: model name (MODEL_NAME), model organization (MODEL_ORG), target server address (TARGET_HOST), deployment user (TARGET_USER), and other necessary parameters.\n\n- Copy `./skills/model-deploy/scripts/deploy.sh` to the specified path on the target server, e.g., `$HOME/wangwei1237`.\n- Grant execute permission to the deployment script on the target server.\n- Run the deployment script on the target server **using the following format**:\n```bash\nssh ${TARGET_USER}@${TARGET_HOST} \"cd $HOME/wangwei1237 && PORT=8001 && ./deploy.sh Qwen3.5-0.8B\"\n```\n- After deployment, test whether the model service has started successfully on the target server by running:\n```bash\ncurl -X POST http://127.0.0.1:8001/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n      \"messages\": [{\"role\": \"user\", \"content\": \"你好\"}],\n      \"max_tokens\": 512\n  }'\n```\n\n## Constraints\n- Commands on the target server **must** be executed in this format:\n  `ssh ${TARGET_USER}@${TARGET_HOST} \"${CMD}\"`\n\n## Troubleshooting\n- **Port occupied**: Check with `netstat -tlnp | grep <port>`\n- **Version issues**: Run `pip install vllm --upgrade`\n- **Network issues**: Set proxy with `export https_proxy=\"http://{proxyaddress}:{port}\"`\n- **Insufficient GPU memory**: Check GPU usage with `nvidia-smi`, find a suitable GPU index `GPU_FAN`, set `export CUDA_VISIBLE_DEVICES=$GPU_FAN` to specify the GPU, then rerun the deployment script.","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":701,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1773492808864,"updatedAt":1778999561997},"latestVersion":{"version":"1.0.0","createdAt":1773492808864,"changelog":"Initial release of model-deploy skill.\n\n- Deploy large language models (LLMs) like Qwen and DeepSeek on specified GPU servers.\n- Supports downloading models via ModelScope and launching the vLLM inference service.\n- Uses a deployment script with parameters for environment, port, GPU count, proxy, and model storage path.\n- Requires passwordless SSH and pre-installed Miniconda on the target server.\n- Provides troubleshooting tips for common deployment issues.","license":"MIT-0"},"metadata":null,"owner":{"handle":"wangwei1237","userId":"s1720re4ts26jyz6fexjgpe7s583g3gb","displayName":"Wang Wei","image":"https://avatars.githubusercontent.com/u/3480154?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780089886121}}