Install
openclaw skills install alibabacloud-compute-provisionAlibaba Cloud Compute Provision - Automatically selects an Alibaba Cloud compute resource (ECS, FC, ACK, PAI) based on user intent, then creates instances and executes scripts. Use this skill when the user needs to run compute jobs, execute scripts, train models, or deploy containerized applications on Alibaba Cloud, or mentions keywords such as cpu_bound, gpu, vCPU, budget, training, A100, or qwen. Provides a full loop of resource selection, pricing, budget control, instance creation, and script execution.
openclaw skills install alibabacloud-compute-provisionAutomatically selects an Alibaba Cloud compute resource based on user intent, then creates instances and executes scripts.
This skill operates by writing and executing Python code that calls Alibaba Cloud APIs. The scripts/ directory contains ready-made Python modules (ECS, FC, ACK, PAI, VPC, etc.) that wrap the Alibaba Cloud OpenAPI. To accomplish any task in this skill, you write Python code snippets that import and call functions from these modules — you do NOT use CLI tools, Terraform, or the web console.
Typical workflow:
scripts/ modules.⛔ MUST-READ RULE: Before calling ANY function from
scripts/, you MUST first read its reference doc (e.g.references/ecs.mdfor ECS functions,references/fc.mdfor FC functions). The reference docs contain exact function signatures, parameter names, constraints, and usage examples. Do NOT guess parameter names — incorrect parameters waste tool calls and may create/leak cloud resources. Use the defaults when in doubt.
Before doing anything else, execute the following code block to set up the Python path and ensure all dependencies are installed. This MUST be the very first code you run in every session — do NOT skip it or defer it.
import sys
sys.path.insert(0, "${SKILL_DIR}/scripts")
from bootstrap import ensure_dependencies
ensure_dependencies()
bootstrap.py is a standalone module with zero third-party dependencies (stdlib only), so it can always be imported even before any pip packages are installed. ensure_dependencies() automatically:
alibabacloud_credentials, alibabacloud_tea_openapi, darabonba-core) and installs them.If this step fails, fix the reported issue (e.g. install a newer Python) before proceeding — all subsequent steps depend on it.
Credentials are resolved via the Alibaba Cloud default credential provider chain (environment variables, ~/.alibabacloud/credentials, ~/.aliyun/config.json, ECS RAM role, etc.). Do NOT hardcode AK/SK or read them explicitly.
ALIBABA_CLOUD_REGION # optional, defaults to cn-hangzhou
Extract the following elements from the user's input:
| Element | Description | Example |
|---|---|---|
| Task type | One-shot script / Long-running service / AI training | "deploy nginx" → long-running service |
| Compute requirement | CPU / GPU / memory | "8 vCPU, 16 GB" |
| Budget | Cost cap | "$50" |
| Script / intent | Explicit script or task description | "a.sh" or "deploy an nginx site" |
When the user provides intent rather than a script (e.g. "deploy an nginx site"), generate the script automatically. Key rules:
apt-get, CentOS/Alinux uses yum. Finalize the script only after the image is decided; if the image changes later, re-check script compatibility.systemctl start nginx), not foreground-blocking ones.If the user explicitly specifies a product, use that product directly and skip selection comparison.
⛔ PRODUCT-LOCK RULE: When the user explicitly specifies a product (e.g. "用 ECS", "use FC"), you are locked to that product for the entire task. If you encounter errors (out of stock, quota limits, etc.), you MUST retry within the same product — try different availability zones, regions, or instance types. NEVER silently switch to a different product. If all retries within the specified product are exhausted, report the failure to the user and ask for guidance — do NOT auto-switch.
For ECS, use
ecs.find_available_instance_type()to search across zones/regions for available stock and pricing, then after cost confirmation useecs.create_instance_with_infra()to create the instance.
When unspecified, follow the decision tree in references/select-resource.md:
User specified a product? → use it directly
Long-running service? → ECS or ACK (FC / PAI-DLC are not suitable for long-running)
AI / ML training? → PAI or FC (GPU) → if both viable, MUST compare in Step 1.5
K8s / containers? → ACK
Multiple products viable? → MUST compare in Step 1.5
Default (single match) → ECS
⛔ ANTI-BIAS: The decision tree only narrows candidates. When 2+ products remain, you MUST proceed to Step 1.5 for real API-based comparison — never assume one is "obviously cheaper" from general knowledge.
⛔ HARD RULE: Region selection MUST be performed explicitly as a documented step — not deferred to or assumed during resource creation. The chosen region directly affects network connectivity, package installation success, and end-to-end reliability.
Decision flow (execute in order):
Detect external dependency requirements — scan the script (user-provided or agent-generated) and the task intent for signals that the workload will access overseas sources at runtime:
pip install, npm install, apt-get install, yum install, go get, cargo build, gem install, composer installcurl / wget to non-Chinese URLsApply region rule:
| Condition | Region | Rationale |
|---|---|---|
| Script installs external dependencies from overseas sources (pip, npm, apt, GitHub, etc.) | Overseas region (prefer ap-southeast-1 Singapore) | Domestic regions have poor/unstable connectivity to overseas package registries, causing timeouts and failures |
| Task deploys a website/service with no overseas dependencies | Domestic region (e.g. cn-hangzhou, cn-shanghai) | Lower latency for end users |
| AI training downloading models/datasets from Hugging Face, GitHub, etc. | Overseas region | Model downloads from China often timeout |
| No external network access needed (pure compute, local data) | Domestic region (e.g. cn-hangzhou) | Default, lowest latency |
| User explicitly specified a region | User's specified region | Respect user choice |
Pitfall: deploying a website seems "domestic", but if the setup script runs
npm install/pip install, the packages come from overseas — choose an overseas region. Always check the script's dependency commands, not just the service purpose.
Output the chosen region and reason to the user before proceeding:
Region: ap-southeast-1 (Singapore)
Reason: The task requires installing packages via pip/npm from overseas sources.
Domestic regions may cause installation timeouts.
⛔ HARD RULE: When the user has NOT explicitly specified a product AND the decision tree yields more than one candidate, you MUST launch parallel sub-agents — one per candidate product. It is strictly forbidden to compare in the main thread using documentation knowledge or heuristics alone.
Dispatch rules:
DescribeInstanceTypes), inventory checks (DescribeAvailableResource), and pricing queries (DescribePrice or product-specific formulas). Memorized prices are NOT acceptable.Comparison dimensions (all required): end-to-end time, estimated cost (from API), complexity, resource cleanup.
When uncertain about API usage, search the docs with scripts/doc_search.py:
from doc_search import search_and_format
print(search_and_format("DescribeInstanceTypes", product="ecs"))
After selecting a product, read its reference doc (linked below) for full API usage — especially function signatures and parameter constraints — then create resources. Use the region from Step 1.4; if Step 1.4 is not yet done, go back and complete it first.
| Product | Reference | Workflow summary |
|---|---|---|
| ECS | references/ecs.md | find_available_instance_type() → cost confirmation → create_instance_with_infra() (VPC/SG/image handled internally) |
| FC | references/fc.md | choose spec → cost confirmation → create function → invoke function |
| ACK | references/ack.md | choose node spec → cost confirmation → VPC/SG → create cluster → submit K8s Job |
| PAI | references/pai.md | list_ecs_specs → choose CPU/GPU → cost confirmation → create_training_job |
Network preparation (ACK only; ECS is handled by create_instance_with_infra): see references/vpc.md
⛔ HARD BLOCK: Before calling ANY resource-creation API (
RunInstances,CreateFunction,CreateCluster,CreateTrainingJob), you MUST estimate cost and get user confirmation. The agent may NOT self-approve — regardless of how low the cost is.
Flow:
Skip-confirmation exception: if the user has explicitly stated in the current conversation that no confirmation is needed (e.g. "直接执行不用确认", "skip confirmation", "just do it, no need to ask"), then still output the cost estimate (step 2) for the record, but proceed immediately without waiting — skip steps 3-4.
Cost display template:
Cost estimate:
Spec: ecs.t6-c1m2.large (2 vCPU, 4 GB)
Unit price: CNY 0.017 / hour
Duration: ~5 minutes
Total: CNY 0.002
Billing: PostPaid (pay-as-you-go)
Proceed with creation?
Exchange-rate reference: $1 ≈ CNY 7.2
⛔ HARD BLOCK: Before executing any script, the following validation steps are required and non-skippable. If validation fails, you MUST stop the flow and report the error to the user. It is strictly forbidden to generate a placeholder/stub script, fabricate execution output, or silently proceed when a required file is missing.
Validation flow (apply before every execution):
Determine script source type:
/home/user/train.py, ./scripts/run.sh).For type (A) — verify file existence and content:
Read tool or ls / cat to confirm the file exists at the given path and is non-empty. If the file is on a remote instance (ECS), run the check via Cloud Assistant (test -f <path> && wc -l <path>).❌ Script not found: <path>
The specified script file does not exist or is empty. Please verify the path and try again.
Do NOT create a replacement script, guess the content, or continue execution.Content completeness check (for all source types):
pass/TODO placeholders.model.fit(), trainer.train(), torch.distributed.launch).Dependency & environment pre-check (best effort):
pip install in the startup command, mount data volumes).Rationale: creating compute resources costs money. Running a missing or placeholder script wastes that cost and misleads the user into thinking the task succeeded.
| Product | Task type | Call |
|---|---|---|
| ECS | One-shot (run and release) | ecs.run_command_and_cleanup(instance_id, script, infra=infra) |
| ECS | Long-running (keep alive) | ecs.run_command_and_wait(instance_id, script) |
| FC | One-shot | fc.create_and_invoke(script_path=path) or fc.create_and_invoke(script_content=code, script_type="shell") |
| ACK | K8s Job | ack.run_script_as_job(cluster_id, script) |
| PAI | Training job | script is set at create_training_job time |
⛔ ECS cleanup rule: For one-shot tasks, you MUST use
run_command_and_cleanup()with theinfraparameter (fromcreate_instance_with_infra()). This releases the instance + security group, and only deletes VSwitch/VPC if they were freshly created (shared resources are preserved). Forgetting to release ECS instances causes ongoing charges.Use
run_command_and_wait()(without cleanup) only when the user explicitly needs the instance to stay running (e.g. "deploy a website", "keep the service online").
The whole flow uses retry-with-adjustment:
| Error | Strategy |
|---|---|
| Out of stock | Try in order: switch availability zone → switch region → downgrade instance type. For ECS use find_available_instance_type(regions=[...]) which searches across regions automatically. NEVER switch to a different product. |
| Quota exceeded | Prompt user to raise quota |
| Over budget | Downgrade spec or shrink scale |
| Script execution failed | Analyze the error, adjust environment / dependencies, then retry |
| Unknown error | Search docs with doc_search.search(error_message, product) |
Keep adjusting and retrying until the instance is created and the script is running.
| Document | Content |
|---|---|
| references/select-resource.md | Comparison of the four products and selection decision tree |
| references/vpc.md | VPC / VSwitch API quick reference |
| references/ecs.md | Full ECS API quick reference (specs / inventory / pricing / creation / execution) |
| references/fc.md | FC API quick reference + script-packaging method |
| references/ack.md | ACK cluster API quick reference + K8s Job execution |
| references/pai.md | PAI-DLC training-job API quick reference + GPU spec table |
| references/ram-policies.md | RAM 最小权限清单与 Policy JSON |