Install
openclaw skills install rocm-vllm-deploymentProduction-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification, and functional testing with comprehensive logging and security best practices.
openclaw skills install rocm-vllm-deploymentProduction-ready automation for deploying vLLM inference services on AMD ROCm GPUs using Docker Compose.
Recommended (for production): Add to ~/.bash_profile:
# HuggingFace authentication token (required for gated models)
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Model cache directory (optional)
export HF_HOME="$HOME/models"
# Apply changes
source ~/.bash_profile
Not required for testing: The skill will proceed without these set:
/root/.cache/huggingface/hubPriority Order:
hf_token: "xxx")/root/.cache/huggingface/hub| Variable | Required | If Missing |
|---|---|---|
HF_TOKEN | Conditional | Continue without token (public models work; gated models fail at download with clear error) |
HF_HOME | No | Warning + Default — Use /root/.cache/huggingface/hub |
Philosophy: Fail fast for configuration errors, fail at download time for authentication errors.
Location: <skill-dir>/scripts/
Validate and load environment variables before deployment.
Usage:
# Basic check (HF_TOKEN optional, HF_HOME optional with default)
./scripts/check-env.sh
# Strict mode (HF_HOME required, fails if not set)
./scripts/check-env.sh --strict
# Quiet mode (minimal output, for automation)
./scripts/check-env.sh --quiet
# Test with environment variables
HF_TOKEN="hf_xxx" HF_HOME="/models" ./scripts/check-env.sh
Exit Codes:
| Code | Meaning |
|---|---|
| 0 | Environment check completed (variables loaded or defaulted) |
| 2 | Critical error (e.g., cannot source ~/.bash_profile) |
Note: This script is optional. You can also directly run source ~/.bash_profile.
Generate human-readable deployment report after successful deployment.
Usage:
./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used]
# Example:
./scripts/generate-report.sh \
"Qwen-Qwen3-0.6B" \
"vllm-qwen3-0-6b" \
"8001" \
"✅ Success" \
"3.6" \
"1.2"
Parameters:
| Parameter | Required | Description |
|---|---|---|
model-id | Yes | Model ID (with / replaced by -) |
container-name | Yes | Docker container name |
port | Yes | Host port for API endpoint |
status | Yes | Deployment status (e.g., "✅ Success") |
model-load-time | No | Model loading time in seconds |
memory-used | No | Memory consumption in GiB |
Output: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md
Exit Codes:
| Code | Meaning |
|---|---|
| 0 | Report generated successfully |
| 1 | Missing required parameters |
| 2 | Output directory not found |
Integration: This script is automatically called in Phase 7 of the deployment workflow.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model_id | String | Yes | - | HuggingFace model ID |
| docker_image | String | No | rocm/vllm-dev:nightly | vLLM Docker image |
| tensor_parallel_size | Integer | No | 1 | Number of GPUs |
| port | Integer | No | 9999 | API server port |
| hf_home | String | No | ${HF_HOME} or /root/.cache/huggingface/hub | Model cache directory |
| hf_token | Secret | Conditional | ${HF_TOKEN} | HuggingFace token (optional for public models, required for gated models) |
| max_model_len | Integer | No | Auto-detect | Maximum sequence length |
| gpu_memory_utilization | Float | No | 0.85 | GPU memory utilization |
| auto_install | Boolean | No | true | Auto-install dependencies |
| log_level | String | No | INFO | Logging verbosity |
All deployment artifacts MUST be saved to:
$HOME/vllm-compose/<model-id-slash-to-dash>/
Convert model ID to directory name by replacing / with -:
openai/gpt-oss-20b → $HOME/vllm-compose/openai-gpt-oss-20b/Qwen/Qwen3-Coder-Next-FP8 → $HOME/vllm-compose/Qwen-Qwen3-Coder-Next-FP8/Per-model directory structure:
$HOME/vllm-compose/<model-id>/
├── deployment.log # Full deployment logs (stdout + stderr)
├── test-results.json # Functional test results (JSON format)
├── docker-compose.yml # Generated Docker Compose file
├── .env # HF_TOKEN environment (chmod 600, optional)
└── DEPLOYMENT_REPORT.md # Human-readable deployment summary
File requirements:
deployment.log — Capture ALL container logs during deploymenttest-results.json — Save API response from functional test requestDEPLOYMENT_REPORT.md — Generated in Phase 7Step 0.1: Load Environment Variables
# Source ~/.bash_profile to load HF_HOME and HF_TOKEN
source ~/.bash_profile
# If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub
If HF_HOME is not defined in ~/.bash_profile, it defaults to /root/.cache/huggingface/hub.
Step 0.2: Create Output Directory
$HOME/vllm-compose/<model-id>/Step 0.3: Initialize Logging
$HOME/vllm-compose/<model-id>/deployment.logStep 0.4: System Checks
Use HF_HOME from Phase 0 (environment variable or default):
# Download model to HF_HOME
huggingface-cli download <model_id> --local-dir "$HF_HOME/hub/models--<org>--<model>"
# Or use snapshot_download via Python:
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')"
Authentication Handling:
| Scenario | Behavior |
|---|---|
| Public model + no token | ✅ Download succeeds |
| Public model + token provided | ✅ Download succeeds |
| Gated model + no token | ❌ Download fails with "authentication required" error |
| Gated model + invalid token | ❌ Download fails with "invalid token" error |
| Gated model + valid token | ✅ Download succeeds |
On Authentication Failure:
echo "ERROR: Model download failed - authentication required"
echo "This model requires a valid HF_TOKEN."
echo ""
echo "Please add to ~/.bash_profile:"
echo " export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""
echo "Then run: source ~/.bash_profile"
exit 1
$HF_HOME/hub/models--<org>--<model-name>/deployment.logGenerate files in output directory:
docker-compose.yml → $HOME/vllm-compose/<model-id>/docker-compose.yml
.env → $HOME/vllm-compose/<model-id>/.env (optional)
HF_TOKEN=<value>chmod 600Volume mount example:
volumes:
- ${HF_HOME}:/root/.cache/huggingface/hub:ro
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
Important: Docker Compose reads ${HF_HOME} from the host environment at runtime. Before running docker compose, source ~/.bash_profile: source ~/.bash_profile
Important: Before deploying, pull the latest image to ensure updates:
docker pull rocm/vllm-dev:nightly
Note: Default port is 9999. Before running docker compose, check if port is available: ss -tlnp | grep :<port>. If port is in use, specify a different port in docker-compose.yml.
/v1/chat/completions API$HOME/vllm-compose/<model-id>/test-results.jsondeployment.logdeployment.logtest-results.jsonGenerate human-readable deployment report using the helper script.
Step 7.1: Extract Deployment Metrics
# Parse deployment.log for metrics
MODEL_LOAD_TIME=$(grep -o "model loading took [0-9.]* seconds" deployment.log | grep -o '[0-9.]*' || echo "N/A")
MEMORY_USED=$(grep -o "took [0-9.]* GiB memory" deployment.log | grep -o '[0-9.]*' || echo "N/A")
Step 7.2: Generate Report
# Execute the report generation script
<skill-dir>/scripts/generate-report.sh \
"<model-id>" \
"<container-name>" \
"<port>" \
"<status>" \
"$MODEL_LOAD_TIME" \
"$MEMORY_USED"
# Example:
./scripts/generate-report.sh \
"Qwen-Qwen3-0.6B" \
"vllm-qwen3-0-6b" \
"8001" \
"✅ Success" \
"3.6" \
"1.2"
Output: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md
Report Contents:
Completion Criteria:
DEPLOYMENT_REPORT.md exists in output directory.env to .gitignore${TOKEN:0:10}...HF_TOKEN=$HF_TOKEN docker compose up -dHF_TOKEN in user's shell config| Issue | Solution |
|---|---|
HF_TOKEN not set | Add export HF_TOKEN="hf_xxx" to ~/.bash_profile, then source ~/.bash_profile. Or provide via parameter. |
HF_HOME not set | defaults to /root/.cache/huggingface/hub. For production, add export HF_HOME="/path" to ~/.bash_profile. |
~/.bash_profile not found | Create ~/.bash_profile and add environment variables. |
Changes not taking effect | Run source ~/.bash_profile or restart terminal. |
HF_TOKEN provided but download still fails | Token may be invalid or lack access to the model. Verify token at https://huggingface.co/settings/tokens |
| Issue | Solution |
|---|---|
Authentication required (gated model) | Set HF_TOKEN in ~/.bash_profile or provide via parameter. Ensure token has access to the model. |
Model not found | Verify model ID is correct (case-sensitive). Check model exists on HuggingFace. |
Download timeout | Check network connection. Large models may take time. |
| Issue | Solution |
|---|---|
| hf CLI not found | pip install huggingface_hub |
| Docker Compose fails | Use docker compose (no hyphen) |
| GPU access fails | Add user to render group: sudo usermod -aG render $USER |
| Port in use | Change port parameter |
| OOM | Reduce gpu_memory_utilization |
cd $HOME/vllm-compose/<model-id>
docker compose down
Check deployment status and logs:
# View deployment directory
ls -la $HOME/vllm-compose/<model-id>/
# View live logs
tail -f $HOME/vllm-compose/<model-id>/deployment.log
# View test results
cat $HOME/vllm-compose/<model-id>/test-results.json
# Check container status
docker ps | grep <model-id>
# Verify environment variables
echo "HF_TOKEN: ${HF_TOKEN:0:10}..."
echo "HF_HOME: $HF_HOME"
Step 1: Add environment variables to ~/.bash_profile
# Required: HuggingFace token
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Recommended: Custom model storage path (production)
export HF_HOME="/data/models/huggingface"
# Apply changes
source ~/.bash_profile
Step 2: Verify environment is ready
# Source ~/.bash_profile to load variables
source ~/.bash_profile
# Expected output:
# === Environment Ready ===
# Summary:
# HF_TOKEN: hf_xxxxxx...
# HF_HOME: /data/models/huggingface
Step 3: Run deployment
# The skill will automatically:
# 1. Source ~/.bash_profile to load HF_HOME and HF_TOKEN
# 2. Use HF_TOKEN and HF_HOME from environment (or ~/.bash_profile, or defaults)
# 3. Proceed without token for public models
# 4. Fail at download time with clear error if gated model requires token
| Version | Changes |
|---|---|
| 1.0.0 | Initial release |