Install
openclaw skills install alibabacloud-pai-rec-diagnosisAlibaba Cloud PAI-Rec Engine Diagnostic and Configuration Validation Skill. Use for diagnosing PAI-Rec engine interface issues and validating engine configurations. Triggers: "PAI-Rec", "engine diagnosis", "engine config validation", "pairec", "recommendation engine".
openclaw skills install alibabacloud-pai-rec-diagnosisThis skill provides comprehensive diagnostic and validation capabilities for Alibaba Cloud PAI-Rec (Programmable Recommendation System) engines, including interface troubleshooting and configuration analysis.
PAI-Rec is Alibaba Cloud's programmable recommendation system that provides intelligent recommendation capabilities. This skill helps users:
Diagnose PAI-Rec Engine Interface Issues: When engine API returns errors or unexpected results, trace the request through EAS service logs and engine configurations to identify root causes.
Validate Engine Configurations: Analyze engine configuration files for potential issues, inconsistencies, or misconfigurations before deployment.
Architecture: PAI-EAS Service + PAI-Rec Engine + Engine Configuration Management
Pre-check: Aliyun CLI >= 3.3.3 required
Run
aliyun versionto verify >= 3.3.3. If not installed or version too low, runcurl -fsSL https://aliyuncli.alicdn.com/setup.sh | bashto install/update, or seereferences/cli-installation-guide.mdfor installation instructions.
Pre-check: Aliyun CLI plugin update required
[MUST] run
aliyun configure set --auto-plugin-install trueto enable automatic plugin installation. [MUST] runaliyun plugin updateto ensure that any existing plugins are always up-to-date.
Pre-check: Alibaba Cloud Credentials Required
Security Rules:
- NEVER read, echo, or print AK/SK values (e.g.,
echo $ALIBABA_CLOUD_ACCESS_KEY_IDis FORBIDDEN)- NEVER ask the user to input AK/SK directly in the conversation or command line
- NEVER use
aliyun configure setwith literal credential values- ONLY use
aliyun configure listto check credential statusaliyun configure listCheck the output for a valid profile (AK, STS, or OAuth identity).
If no valid profile exists, STOP here.
- Obtain credentials from Alibaba Cloud Console
- Configure credentials outside of this session (via
aliyun configurein terminal or environment variables in shell profile)- Return and re-run after
aliyun configure listshows a valid profile
[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:
- Read
references/ram-policies.mdto get the full list of permissions required by this SKILL- Use
ram-permission-diagnoseskill to guide the user through requesting the necessary permissions- Pause and wait until the user confirms that the required permissions have been granted
For detailed RAM policies required by this skill, see references/ram-policies.md.
IMPORTANT: Parameter Confirmation — Before executing any command or API call, ALL user-customizable parameters (e.g., RegionId, service names, instance IDs, environment names, config names, etc.) MUST be confirmed with the user. Do NOT assume or use default values without explicit user approval.
| Parameter Name | Required/Optional | Description | Default Value |
|---|---|---|---|
service-name | Required | PAI-EAS service name | N/A |
cluster-id | Required | EAS cluster/region ID | N/A |
instance-id | Required | PAI-Rec instance ID | N/A |
request-id | Required (for diagnosis) | Request ID from engine API response | N/A |
config-name | Optional | Engine configuration name | N/A |
environment | Optional | Engine environment (Prod/Pre) | N/A |
engine-config-id | Optional | Specific engine config version ID | N/A |
region | Optional | Alibaba Cloud region | N/A |
At the start of the Core Workflow (before any CLI invocation): [MUST] Enable AI-Mode — AI-mode is required for Agent Skill execution. Run the following commands before any CLI invocation:
aliyun configure ai-mode enable aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-pai-rec-diagnosis"
[MUST] Disable AI-Mode at EVERY exit point — Before delivering the final response for ANY reason, always disable AI-mode first. This applies to ALL exit paths: workflow success, workflow failure, error/exception, user cancellation, session end, or any other scenario where no further CLI commands will be executed. AI-mode is only used for Agent Skill invocation scenarios and MUST NOT remain enabled after the skill stops running.
aliyun configure ai-mode disable
This workflow helps diagnose issues when a PAI-Rec engine API returns errors or unexpected results.
Input Example:
Service Name: embedding_recall
API Response:
{
"code": 299,
"msg": "items size not enough",
"request_id": "941b4e14-d1c5-489f-a184-b2b17f8b4fdb",
"size": 0,
"experiment_id": "",
"items": []
}
Get the service details to find the EAS service ID and configuration:
aliyun eas describe-service \
--cluster-id <cluster-id> \
--service-name <service-name>
What to extract:
Resource: EAS service resource ID (e.g., eas-r-1v4qb1yan3qmnjwxqe)ServiceConfig.envs: Environment variables containing:
REGION: The regionINSTANCE_ID: PAI-Rec instance IDCONFIG_NAME: Engine configuration namePAIREC_ENVIRONMENT: Environment (product/prepub)Parse the API response JSON to get the request_id field. This will be used to search service logs.
Use the request ID as the sole filter to search service logs. Do NOT pass --start-time / --end-time when searching PAI-Rec business logs:
aliyun eas describe-service-log \
--cluster-id <cluster-id> \
--service-name <service-name> \
--keyword <request-id> \
--page-size 500
[CRITICAL] Known CLI pitfall — keyword-only lookup is required for business logs:
--keyword is supplied (no time range), the CLI returns the full PAI-Rec application trace (controller.go / feed.go / recall.go / rank_service.go etc.) matching the request_id.--start-time / --end-time are added — even if the window covers the real log timestamp — the CLI silently drops business logs and only returns infrastructure noise (/bin/sh wrapper heartbeats, 502 Bad Gateway retries, postgres.go dbstat).--keyword <request-id> alone.Notes:
--keyword: Use the full request_id extracted from the API response (case-sensitive exact match).--page-size: Raise to 500 to capture the entire trace in a single page; total matched entries for one request is usually < 30.--start-time / --end-time: Only use these for broad time-window scans without --keyword (e.g., when investigating non-request-specific issues). Required format is yyyy-MM-dd HH:mm:ss in UTC (space separator, no T / no Z). ISO-8601 forms like 2025-04-28T00:00:00Z will be rejected with InvalidParameter.Map the environment and list matching configurations:
Environment Mapping:
product → Prodprepub → Prealiyun pairecservice list-engine-configs \
--instance-id <instance-id> \
--environment <Prod|Pre> \
--status Released \
--name <config-name>
What to extract:
Status: ReleasedEngineConfigId and Versionaliyun pairecservice get-engine-config \
--instance-id <instance-id> \
--engine-config-id <engine-config-id>
What to extract:
ConfigValue: The actual engine configuration (JSON/YAML)Optionally run scripts/validate.py against the retrieved ConfigValue to quickly
rule out structural / reference / naming errors in the engine configuration
before diving into the log trace. See Workflow 2 § Step 3 and
references/config-validation.md for usage,
exit codes, and the full rule list.
printf '%s' "$CONFIG_VALUE" | python3 scripts/validate.py --stdin
When to run: when the log trace points at a specific configuration element
(e.g. a RecallConfs / FilterConfs / SceneConfs entry), or when the
configuration is being diagnosed for the first time in this skill session.
When to skip: when the log trace already shows a decisive non-config root
cause (e.g. a scene_id not present in SceneConfs, a 5xx from an upstream
EAS dependency, a missing feature table). validate.py is a static checker and
cannot detect request-time mismatches between client input and configuration.
[MUST] Scoping rule for the final report:
validate.py findings may enter the final diagnosis ONLY when they are
directly tied to the log evidence for the current request_id
(e.g. the log blames a RecallConf name that validate.py flags as
duplicated or dangling).request_id trace MUST NOT be added to the
final conclusion. They remain an internal sanity-check signal only. This
preserves the evidence-only reporting rule in Step 6.Analyze the following components together:
Common Issues to Check:
[MUST] Evidence-only reporting rule:
The final diagnosis delivered to the user MUST be grounded strictly in what the EAS service logs and the engine configuration directly show. Apply the following constraints:
This workflow validates engine configurations for potential issues.
Input: Configuration name and environment (Prod/Pre)
If user doesn't provide engine-config-id, list available versions:
aliyun pairecservice list-engine-configs \
--instance-id <instance-id> \
--environment <Prod|Pre> \
--name <config-name>
Display to user:
Version: Version numberStatus: Configuration status (Released/Draft/Archived)GmtCreateTime: Creation timestampEngineConfigId: Version IDAsk user to select a version or provide the engine-config-id.
aliyun pairecservice get-engine-config \
--instance-id <instance-id> \
--engine-config-id <engine-config-id>
[MUST] Feed the extracted ConfigValue JSON into scripts/validate.py. The script
enforces JSON Schema (references/schema.json) + reference-consistency rules and exits
with status 0 on pass, 1 on failure.
# From stdin (recommended when ConfigValue is already in memory)
printf '%s' "$CONFIG_VALUE" | python3 scripts/validate.py --stdin
# From a saved JSON file
python3 scripts/validate.py /tmp/engine-config.json
# From an inline JSON string
python3 scripts/validate.py '{"RunMode":"product","RecallConfs":[...]}'
Requires jsonschema (pip install jsonschema); if missing the script falls back to
rule-only validation without Schema checks.
What the script checks (summary):
RunMode,
RecallConfs, FilterConfs, SortConfs, AlgoConfs, SceneConfs, RankConf,
FeatureConfs, UserFeatureConfs, DebugConfs, FeatureLogConfs,
CallBackConfs, PipelineConfs, etc.)RecallType / FilterType / SortType / RunMode /
DebugConfs.OutputType / GeneralRankConfs.ActionConfs[].ActionTypeSceneConfs.RecallNames → RecallConfs;
FilterNames → FilterConfs; SortNames → SortConfs;
RankConf.RankAlgoList → AlgoConfs; any DaoConf.AdapterType +
*Name → the corresponding *Confs (Hologres / Redis / MySQL / TableStore /
FeatureStore / …)User2ItemExposureFilter with WriteLog=true + FeatureStore adapter: must set
TimeInterval > 0PriorityAdjustCountFilter in accumulator mode: Count must be strictly
increasing (use Type="fix" for independent per-recall caps)PipelineConfs.*.Name must be globally uniqueDebugConfs.Rate must be an integer in [0, 100]RecallConfs, FilterConfs, SortConfs,
AlgoConfsDetailed usage, exit codes, example outputs and the full rule list live in references/config-validation.md.
Report to the user based strictly on the script's output plus any additional
inspection of ConfigValue:
ConfigValue (e.g. naming collisions between RankScore variables
and model output fields, env/region mismatches)Do not add speculative fixes or best-practice tangents; suggestions are provided only when the user explicitly asks for them.
For detailed verification steps, see references/verification-method.md.
Quick Verification:
For Diagnosis Workflow:
For Validation Workflow:
This skill performs read-only operations and does not create any resources that require cleanup.
Always capture request_id: When reporting API issues, include the full response with request_id for accurate log correlation.
Log queries — keyword only, no time range: For request-level diagnosis, pass --keyword <request_id> to aliyun eas describe-service-log and leave --start-time / --end-time unset. Combining keyword with a time range filters out business logs due to a CLI quirk (see Workflow 1, Step 3). Only use time ranges for broad non-request scans, and only with the yyyy-MM-dd HH:mm:ss UTC format (no T / no Z).
Environment awareness: Always verify that configurations match the target environment (Prod vs Pre).
Version control: When validating configurations, check multiple versions if issues persist across deployments.
Log retention: EAS service logs are retained for limited periods; diagnose issues promptly after occurrence.
Configuration backup: Before applying changes based on validation results, ensure current configurations are backed up.
Cross-reference: Compare working configurations with problematic ones to identify differences.
Service status: Check EAS service status before diagnosing; service-level issues may mask configuration problems.
Evidence-only conclusions: Ground every statement in the diagnosis on a specific log line or config fragment. Do not speculate, do not propose fixes, and do not volunteer best-practice advice unless the user explicitly asks. If the evidence is insufficient, say what is missing rather than inferring.
Structured analysis: Follow the systematic workflow rather than jumping to conclusions based on error messages alone.
Document findings: Keep track of recurring issues and their resolutions for faster future diagnosis.
| Reference Document | Description |
|---|---|
| RAM Policies | Required RAM permissions for PAI-Rec and EAS APIs |
| Related Commands | Complete CLI command reference |
| Verification Method | Detailed verification procedures |
| CLI Installation Guide | Alibaba Cloud CLI installation instructions |
| Configuration Examples | Sample engine configurations and common patterns |
| Config Validation | scripts/validate.py usage, exit codes, rule catalogue |
| Troubleshooting Guide | Common issues and solutions |