Install
openclaw skills install alibabacloud-emr-starrocks-assistantAlibaba Cloud EMR Serverless StarRocks development & operations assistant. Covers five scenarios: cluster connection, schema design, data ingestion, SQL development & tuning, and cluster health diagnostics. Use this Skill when users ask about StarRocks table design, writing SQL, choosing an ingestion method, query execution plans, materialized views, cluster health checks, FE/BE/CN node status, tablet health, or compaction. Typical scenarios: table design, Stream Load / Routine Load / Broker Load selection, SQL optimization, window functions, CTEs, JOIN tuning, materialized view design, cluster health inspection, node-down diagnosis. Not applicable for: StarRocks instance lifecycle management (create / scale / restart / config change / version upgrade — these are control-plane operations, please use the EMR Serverless console or the corresponding OpenAPI), or other Alibaba Cloud products (EMR Cluster, Spark, Milvus, ClickHouse, Doris, RDS, ECS).
openclaw skills install alibabacloud-emr-starrocks-assistantHelp users perform day-to-day table design, data ingestion, SQL writing & tuning, and health diagnostics on Alibaba Cloud EMR Serverless StarRocks. All cluster access goes through the bundled srsql CLI (pymysql-based, uses the user's own account); no MySQL client required. Non-READ SQL is classified by sqlglot and requires --yes confirmation before execution.
Scope statement: This Skill focuses on using StarRocks — development, diagnostics, and day-to-day data operations. Cluster-internal data and schema operations (DDL, DML, materialized view refresh, GRANT, etc.) are supported and execute under the user's own account, gated by sqlglot classification +
--yesconfirmation. Instance-lifecycle operations (create, scale, restart, configuration change, version upgrade) are control-plane operations and are not in this Skill's scope; please use the EMR Serverless console or the corresponding OpenAPI.
When to use:
When NOT to use:
This Skill ships with the sr-connect Python CLI. See references/connect.md for details.
When this Skill is invoked and you anticipate running any cluster query, ensure srsql is available before asking the user for anything:
which srsql. If it returns a path, skip to step 4.uv tool install <skill-project-root> where <skill-project-root> is the directory containing this SKILL.md and pyproject.toml (the Skill's base directory shown at invocation time; commonly ~/.claude/skills/alibabacloud-emr-starrocks-assistant/, which may be a symlink). Do not ask the user to run this — the bundled CLI is part of the Skill's capability surface, not user infrastructure.uv itself is missing (which uv fails), surface that to the user — uv is a system tool and not auto-installed.~/.starrocks/{profile}.cnf (default profile name: default; respect SR_PROFILE env var if set). If it exists, skip to step 5. If missing:
sr-login --from-env. Safe to call unconditionally — it exits 2 with a clear "missing" message when the environment doesn't have the credentials it needs, and does nothing else. You do not need to inspect environment variables yourself.sr-login --from-env exits non-zero, the user hasn't logged in yet. Give them the sr-login --host ... --user ... command and ask them to run it themselves. Do not run interactive sr-login yourself — it would block on a password prompt you cannot answer.srsql is on PATH and the profile file exists, run queries via srsql -e "..." yourself.If srsql was just installed in this session and PATH hasn't been refreshed in the user's shell, fall back to the absolute path printed by uv tool install (typically ~/.local/bin/srsql).
Chat-style rule after bootstrap succeeds: Do not echo sr-whoami / srsql -e "..." invocation syntax to the user as a "you can now run …" hint. You are the one calling these CLIs on the user's behalf — the user drives the Skill, not the binaries. Skip the post-success "next step" narration entirely and just ask what they want to do, or proceed if their intent is already clear.
# EMR Serverless StarRocks — both internal and public endpoints use the MySQL
# wire protocol over plain TCP; no SSL/TLS. Use the same form for either.
sr-login --host <fe-endpoint> --port 9030 --user <account>
# Verify
sr-whoami
srsql -e "SELECT CURRENT_VERSION()"
Re-running sr-login with the same --profile silently overwrites the stored credential (same semantics as docker login). Use --profile for multi-cluster:
sr-login --profile prod --host fe-prod.xxx --user app_user
SR_PROFILE=prod srsql -e "..."
This Skill has two layers:
srsql is a UX gate, not a security boundary. Every statement is parsed by sqlglot (dialect starrocks):
READ (SELECT / SHOW / DESC / EXPLAIN / WITH / …) executes directly.--yes is passed.UNKNOWN, treated as non-READ, executable with --yes plus a soft warning.When the user asks for a write operation:
srsql --dry-run -e "...".srsql --yes -e "...".For DDL on production tables, or operations that change global cluster state (CREATE/DROP USER, ADMIN SET CONFIG, etc.), prefer to print the SQL and let the user run it themselves — even though the gate would let them run it via --yes. The gate is a safety net, not a license.
SQL passed into srsql -e "..." is assembled by the LLM and must follow these rules:
[A-Za-z0-9_] plus backtick-quoted forms.| Scenario | Handling |
|---|---|
| Profile file content (incl. user password) | Never echoed; mode 600 under a 700 directory |
| Password in error messages | Truncate / replace with ****** |
| Query results contain obvious key / token columns | Warn the user without displaying full content |
aliyun configure list output containing AK | Show only the first 4 chars; replace the rest with **** |
Disambiguation rule: When the user input is ambiguous (e.g. "ingestion is slow", "queries are slow") and context is unclear, ask one clarifying question before acting.
| User intent | Route | Reference |
|---|---|---|
| First-time cluster connection / register or switch credentials / multi-cluster setup | sr-login / sr-whoami / sr-logout | references/connect.md |
| New table / change schema / table model selection / partition+bucket design | Schema design | references/schema.md |
| Choose ingestion method / configure Stream/Broker/Routine Load / Flink/Kafka Connector | Import selection | references/data-import.md |
| Write SQL / optimize SQL / materialized views / function selection / read execution plans | SQL development & tuning | references/sql.md |
| Cluster health check / FE/BE/CN status / unhealthy tablets / compaction lag | Cluster diagnostics | references/diagnostics.md |
| "Ingestion used to be fine, suddenly slow" | Cluster diagnostics (distinct from import selection) | references/diagnostics.md |
| "How should I design a new ingestion pipeline" | Import selection | references/data-import.md |
Four table models and their typical use cases:
| Use case | Model |
|---|---|
| Logs / events / detail records | Duplicate Key |
| Pre-aggregated metrics | Aggregate |
| Real-time upsert / CDC | Primary Key |
| Simple deduplication | Unique Key (for new use cases, prefer Primary Key) |
⚠ Anti-patterns — do not produce these in DDL:
persistent_index_type=CLOUD_NATIVE + datacache.partition_duration — LOCAL index doesn't survive CN rebalance; no hot-data caching window. See schema/storage-properties.md.datacache.partition_duration to an arbitrary "hot window" (e.g. 30 DAY) instead of the user's stated query window — the value MUST be ≥ the query window. If the user says "查询近 N 天" / "queries the last N days", set datacache.partition_duration = "N DAY" (or larger). A value smaller than the query window guarantees cache misses on in-window queries. Do not default to 7/30/60 days when the user has given you a number.storage_cooldown_time/storage_cooldown_ttl/storage_medium/replicated_storage on shared-data — silently stripped or rejected by PropertyAnalyzer; use datacache.partition_duration for the cooldown effect.PRIMARY KEY — not supported; use BIGINT or DECIMAL.See references/schema.md.
| Data source | Recommended method |
|---|---|
| Local files < 10 GB | Stream Load |
| Object storage / HDFS bulk | Broker Load or INSERT INTO ... FROM FILES() |
| Object storage with continuous file arrivals | Pipe + AUTO_INGEST |
| Kafka / Pulsar | Routine Load or Kafka/Flink Connector |
| MySQL CDC | Flink CDC + Flink Connector |
⚠ Anti-patterns — do not produce these in load configs:
__op integer column (0=UPSERT, 1=DELETE) in COLUMNS list + $.__op in jsonpaths — all events are silently treated as UPSERT. The __op contract is a pair and must be taught as a pair: the literal column name is __op, and the integer values are __op=0 for UPSERT and __op=1 for DELETE. Even when the user only asks about DELETE, your response MUST state both mappings (__op=0 → UPSERT, __op=1 → DELETE) — never one without the other. This applies on every ingestion path including Flink Connector and Kafka Connector, where the connector populates __op for the user but they still need both values to debug "DELETE not applied" / "UPSERT not applied" symptoms.partial_update=true as a DELETE enabler — it controls partial-column UPSERT and has nothing to do with DELETE. If a user enables it while asking why DELETE doesn't work, flag it as misconfigured-for-intent and tell them to remove it unless they actually have a partial-column UPSERT use case. Do not validate the existing setting just because it parses.COLUMNS FROM PATH AS (...) in Routine Load — that's Broker Load's Hive-partition path syntax; not valid in Routine Load.__op values as strings ("upsert"/"delete") — must be the integers 0 / 1.INSERT INTO VALUES. Whenever the user's scenario implies high event rate, the recommendation MUST cover: (a) the method-appropriate concurrency cap (desired_concurrent_number ≤ Kafka partitions for Routine Load; sink.parallelism ≤ Kafka partitions for Flink/Kafka Connector), AND (b) an explicit TOO_MANY_VERSION / compaction-pressure warning with the relevant flush-interval guidance.See references/data-import.md.
| Use case | Pattern |
|---|---|
| Period-over-period / cumulative / Top-N | Window functions |
Large fact table JOIN small dimension (right side ≤ broadcast_row_limit, default 15M rows) | Broadcast / Colocate |
| Complex layered logic | CTE |
| Billion-scale deduplication | APPROX_COUNT_DISTINCT / BITMAP / HLL |
| High-frequency repeated query acceleration | Asynchronous materialized view |
| Cross-source query | External Catalog |
⚠ Anti-patterns — do not produce these in query rewrites or tuning advice:
date_format() / date_trunc() / cast() in WHERE — breaks partition pruning; rewrite as a range predicate (col >= '...' AND col < '...').EXPLAIN VERBOSE + checking partitions=N/M and tabletRatio=N/M — pruning failures (numerator == denominator) go undetected; never use plain EXPLAIN for this.cardinality in EXPLAIN as the result row count — it's the CBO's row estimate. Always quantify the staleness gap using the direct comparison cardinality vs the user-stated total table size (e.g. "estimate 5M vs total 500M ≈ 100×"); a ratio > 10× means stats are stale → run ANALYZE TABLE.WHERE create_time > '...' introduces large errors (you don't know the data distribution). When the user gives you a total row count, compare cardinality to that directly; do not divide the total by an assumed time window or selectivity factor.partitions/tabletRatio pruning failures with cardinality deviation — these are two independent diagnostic signals. When both look bad in the same OlapScanNode (e.g. partitions=N/N AND cardinality off from total table size by 10×–100×), report them as separate findings with separate fixes (predicate/type fix vs ANALYZE TABLE). Do not use cardinality deviation to "explain" pruning failure, and do not let pruning failure absorb the stale-stats finding.See references/sql.md.
Diagnostic order:
SHOW WAREHOUSESSHOW FRONTENDSSHOW BACKENDS / SHOW COMPUTE NODESSHOW WAREHOUSESSHOW PROC '/statistic'information_schema.fe_tablet_schedulesinformation_schema.be_compactions / be_cloud_native_compactionsinformation_schema.loads⚠ Anti-patterns — do not produce these in diagnostic conclusions:
information_schema.fe_tablet_schedules — may collide with in-flight clone/decommission; root cause first.UnhealthyTabletNum > 0 — always critical per the severity table, never "medium" or "low" risk; the cluster has unhealthy replicas.CloningTabletNum > 0 as a separate problem — clone is the recovery action triggered by UnhealthyTabletNum, not an independent fault signal.See references/diagnostics.md.
| Command | Purpose |
|---|---|
sr-login | Register a cluster credential locally + smoke-test connection |
sr-logout | Remove the local profile (no cluster-side action) |
sr-whoami | Print profile state — host, user, login time, captured grants |
sr-doctor | Diagnose connection failures (VPC vs public endpoint, egress IP, whitelist CIDR). Invoked automatically by sr-login on failure. |
srsql | Daily query entry point; classifies SQL and gates non-READ behind --yes |
This Skill executes SQL queries only via srsql. The following are prohibited:
curl / wget / pip install / npm install to download and run external codeeval / source to load unaudited contentException: uv tool install <skill-project-root> to install the Skill's own bundled sr-connect CLI from its local project directory is allowed and expected — see the Assistant bootstrap protocol above. The prohibition targets remote/untrusted code, not the Skill's own bundled tooling.
| Operation | Recommended timeout |
|---|---|
| Read-only SQL queries | 30 s |
| Diagnostic queries across many large tables | 60 s |
| Retry | Total operation time ≤ 3 minutes |
srsql --format table or --format markdown--format vertical--format json / tsvLIMIT and offer pagination| Error | Cause | Action |
|---|---|---|
Cannot connect to host:port | Wrong endpoint type / IP not whitelisted | sr-login auto-runs sr-doctor on connection failure. Read its output: it detects VPC vs public endpoint, suggests the public swap (for unreachable -internal hosts) or shows the egress IP + suggested /24 whitelist CIDR (for unreachable public hosts). Pass the recommendation to the user verbatim. See references/connect.md. |
Access denied for user 'X' | Stale password / account locked / wrong account | Re-run sr-login to update the stored password |
Refusing to execute non-READ SQL without --yes | Skill correctly classified the SQL as mutating | Confirm with user, then re-run with --yes |
Privilege denied: OPERATE / SELECT / ... | User account lacks the privilege | Surface the limitation; skip the affected diagnostic; don't retry |
Table not found | Wrong DB / table name | Confirm with SHOW DATABASES / SHOW TABLES FROM db |
| Query returns empty but user expects rows | Over-aggressive predicate / RBAC isolation | Check WHERE clauses; suggest the user verify with admin |
No profile 'X' | srsql --profile X without prior sr-login --profile X | Run sr-login for that profile first |
Principle: Read the full error message before deciding; do not retry blindly on the error code alone.