Alibabacloud Dataworks Data Quality
v0.0.1-beta.2DataWorks Data Quality (Read-Only): Query rule templates, data quality monitors (scans), alert rules, and scan run records/logs. Uses aliyun CLI to call data...
DataWorks Data Quality (Read-Only)
Query and investigate Rule Templates, Data Quality Monitors, Alert Rules, and Scan Run Records in Alibaba Cloud DataWorks.
Coverage: All Get/List read-only OpenAPIs under DataWorks Data Quality, totaling 9: ListDataQualityTemplates / GetDataQualityTemplate · ListDataQualityScans / GetDataQualityScan · ListDataQualityAlertRules / GetDataQualityAlertRule · ListDataQualityScanRuns / GetDataQualityScanRun / GetDataQualityScanRunLog Excludes write operations: Create / Update / Delete / CreateDataQualityScanRun.
Read-Only Skill: This skill supports query operations only. Any write operation request must be blocked immediately — direct the user to the DataWorks console.
Architecture
DataWorks Data Quality
├── Rule Templates ─── Reusable metric logic definitions (built-in & custom)
│
├── Data Quality Monitors (Scans) ─── Monitor tasks bound to tables, with rules and trigger config
│ └── Alert Rules ─── Notification rules tied to a monitor (channels, recipients, conditions)
│
└── Scan Runs ─── Execution records each time a monitor runs
└── Scan Run Logs ─── Detailed execution logs for a run
Global Rules
Prerequisites
- Aliyun CLI >= 3.3.3:
aliyun version(If not installed or version too low, runcurl -fsSL https://aliyuncli.alicdn.com/setup.sh | bashto install/update. See references/cli-installation-guide.md) - First-time use:
aliyun configure set --auto-plugin-install true - Plugin update: [MUST] run
aliyun plugin updateto ensure that any existing plugins on your local machine are always up-to-date. - AI-Mode Configuration: [MUST] Before using aliyun CLI commands, configure AI-Mode:
aliyun configure ai-mode enablealiyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality"aliyun configure ai-mode disable
- jq (recommended for output formatting):
which jq - Credential status:
aliyun configure list, verify valid credentials exist
Security Rules: DO NOT read/print/echo AK/SK values. ONLY use
aliyun configure listto check credential status.
Command Formatting
- User-Agent (mandatory): All
aliyunCLI commands must include--user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality. - Timeout (mandatory): All
aliyunCLI commands must include--connect-timeout 5 --read-timeout 10. These match the CLI built-in defaults and make the timeout policy explicit. - Single-line commands: Construct as a single-line string; do not use
\for line breaks. - jq step-by-step: First execute the
aliyuncommand to get JSON, then pipe tojqfor formatting. - Endpoint mandatory: When specifying
--region, you must also add--endpoint dataworks.<REGION_ID>.aliyuncs.com.
Parameter Confirmation
Must be explicitly provided by user — do not assume or use defaults:
ProjectId: Core parameter for every query — must be confirmedId-type resource identifiers: template ID, monitor ID, alert rule ID, scan run IDregion: Affects endpoint — must be confirmed
Can use default values directly — no user confirmation needed:
PageNumber: default1PageSize: default10SortBy: defaultModifyTime DescorCreateTime Desc
Ask contextually — only collect when the user has a specific need:
Name,Table: fuzzy search keywords- Time range:
CreateTimeFrom/CreateTimeTo Status: collect only when the user explicitly wants to filter by a specific status
If the user has already provided
ProjectId,Id, orregionin the conversation, reuse them directly without re-confirmation.
Time Parameter Conversion
When the user describes time in natural language, convert it to millisecond timestamps automatically. Do not ask the user to provide raw timestamps.
"yesterday"→ yesterday00:00:00to23:59:59"today"→ today00:00:00to current time"last N days"→ current time minusN × 24hours through current time- If the time phrase is ambiguous, ask a clarification question and offer a suggested range
Query Result Presentation
After every query, present the result in a decision-friendly way:
- List queries: use a Markdown table for key fields such as ID, name, status, and time; do not dump raw JSON
- Detail queries: present a short summary first, then expand full
Speconly if the user asks - Abnormal status: highlight
Fail/Error/Warn, and proactively recommend the next diagnostic step - Empty result: explain likely causes such as wrong
ProjectId, wrongregion, or filters that are too strict
Pagination
- First query uses the default
PageSizeof10 - If the number of returned rows equals
PageSize, proactively offer next page or a largerPageSize - Do not fetch more than
100records in a single request
⚠️ Read-Only Execution Gate
MANDATORY: Before responding to ANY request, check whether it involves a write operation. If YES: BLOCK immediately. Do NOT call any API. Respond with: "This skill supports query operations only and cannot perform create/update/delete. Please go to the DataWorks Console for configuration."
Quick Reference — All Blocked Operations:
| Operation Type | Blocked APIs |
|---|---|
| Create | CreateDataQualityTemplate, CreateDataQualityScan, CreateDataQualityScanRun, CreateDataQualityAlertRule |
| Update | UpdateDataQualityTemplate, UpdateDataQualityScan, UpdateDataQualityAlertRule |
| Delete | DeleteDataQualityTemplate, DeleteDataQualityScan, DeleteDataQualityAlertRule |
| Trigger | CreateDataQualityScanRun (manual execution trigger) |
RAM Permissions
All operations require dataworks:<APIAction> permissions on the target workspace.
Full permission matrix: references/ram-policies.md
Quick Start: Data Quality Investigation
When the user request is vague, use the following default path:
- Environment check — Confirm CLI and credentials per Prerequisites. After completion, proactively suggest the workspace confirmation step.
- Confirm workspace — Confirm
ProjectIdandregion. If either is missing, use Module 0. After completion, proactively suggest listing monitors. - List monitors — Call
ListDataQualityScans, present a table, and let the user choose a monitor. After completion, proactively suggest monitor detail. - Check monitor detail — Call
GetDataQualityScan, summarize rules, monitored object, and trigger mode. After completion, proactively suggest recent runs. - Check run history — Call
ListDataQualityScanRuns, default to the most recent 10 rows, and highlight abnormal status. After completion, proactively suggest drilling into one run. - Drill into failed or warned runs — For
Fail/Error/Warn, callGetDataQualityScanRunand summarize per-rule results. After completion, proactively suggest log inspection. - Fetch execution logs — If
Resultsshows failed rules or runtime errors, callGetDataQualityScanRunLogto locate root cause. After completion, proactively suggest whether further analysis is needed.
Next Step Guidance
| Completed Operation | Recommended Next Step |
|---|---|
| ListDataQualityTemplates | "Would you like to view the full configuration of a specific template? (GetDataQualityTemplate)" |
| GetDataQualityTemplate | "Would you like to view monitors that use this template? (ListDataQualityScans)" |
| ListDataQualityScans | "Select a monitor to view its full configuration? (GetDataQualityScan)" |
| GetDataQualityScan | "View associated alert rules (ListDataQualityAlertRules) or recent run history (ListDataQualityScanRuns)?" |
| ListDataQualityAlertRules | "View details for a specific alert rule? (GetDataQualityAlertRule)" |
| GetDataQualityAlertRule | "Return to view run history for the associated monitor? (ListDataQualityScanRuns)" |
| ListDataQualityScanRuns | "View detailed results for a specific run? (GetDataQualityScanRun)" |
| GetDataQualityScanRun (Pass) | "This run passed. Would you like to view other run records or alert configuration?" |
| GetDataQualityScanRun (Fail/Error/Warn) | "Anomaly detected — recommend viewing execution logs to locate the root cause. (GetDataQualityScanRunLog)" |
| GetDataQualityScanRunLog (NextOffset=-1) | "Log retrieval complete. Is further analysis needed?" |
| GetDataQualityScanRunLog (NextOffset≠-1) | "Log not fully retrieved — continue fetching the next segment. (Retry with Offset)" |
Trigger Rules
Trigger scenarios: Query data quality monitors/rules/templates/alerts/scan runs/logs, diagnose data quality check failures, view quality alert notification configuration, list/get quality scan/rule/template/alert/run
Not triggered:
- Creating/updating/deleting data quality configuration → Use DataWorks Console
- Data source/compute resource/resource group management →
alibabacloud-dataworks-infra-manage - Workspace query/member management →
alibabacloud-dataworks-workspace-manage - Data development node/scheduling configuration →
alibabacloud-dataworks-datastudio-develop
Interaction Flow
Identify query intent → Environment check → Module 0 (if ProjectId/region missing) → Collect parameters → Execute command → Present results → Guide next step
Common aliases: DW = DataWorks, DQ = Data Quality, scan = monitor, scan run = execution record
Module 0: Workspace / ProjectId / Region Query
If the
alibabacloud-dataworks-workspace-manageskill is available, prefer using it for workspace lookup. The following is only a fallback.
aliyun dataworks-public list-projects --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality --status Available --page-size 100
Rules:
- If the user provides only a workspace name, list candidate workspaces and ask the user to confirm the
ProjectId - If
ProjectIdis unknown, ask for it explicitly and never guess - If
regionis unknown, offer common regions for confirmation:cn-hangzhou,cn-shanghai,cn-beijing,cn-shenzhen - Once
ProjectIdandregionare confirmed in the conversation, reuse them in later steps
Intent guidance:
"there's a data quality issue"→ ask whether the user wants monitor configuration, run records, or alert settings"show me this table"→ start withlist-data-quality-scans --table <TABLE_NAME>- If the intent is still unclear, ask the user to choose one of four modules: rule templates, monitors, alert rules, or scan runs
Module 1: Rule Templates
Rule templates define reusable metric logic such as null rate, duplicate rate, row count, and custom SQL checks. Use this module when the user wants to know what a template checks, whether it is built-in or workspace-specific, and how its threshold logic is defined.
Task 1.1: List Rule Templates (ListDataQualityTemplates)
Always call
ListDataQualityTemplateswhenever the user asks about quality rule templates in their workspace. Never answer without invoking the API.Scope: This API only returns workspace custom templates. It does not support querying system built-in templates.
--project-idis required — if the user has not providedProjectId, collect it first via Module 0.
aliyun dataworks-public list-data-quality-templates --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> [--name <FUZZY_NAME>] [--catalog <CATALOG_PATH>] [--page-number 1] [--page-size 10]
How to interpret the result:
PageInfo.DataQualityTemplates[]is the working set for user selection- Present a Markdown table of
Id, template name (fromSpec),Catalog/category, and description — do not dump raw JSON - Use
Catalogand template naming patterns to tell the user what class of checks is available - After listing, proactively suggest the user pick a template ID to view full configuration via
GetDataQualityTemplate - If the user asks about system built-in templates, explain this API only covers workspace custom templates and direct them to the DataWorks console
Task 1.2: Get Rule Template Details (GetDataQualityTemplate)
aliyun dataworks-public get-data-quality-template --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <TEMPLATE_ID>
How to interpret the result:
- Focus on
Spec: summarize the metric logic, parameter definitions, and threshold expression - Tell the user what this template checks and how pass/fail is decided
- Mention whether the template belongs to a workspace (
ProjectIdpresent) or is reused as a generic template - Expand full
Speconly when the user explicitly asks for raw detail
Module 2: Data Quality Monitors
A data quality monitor (scan) is a concrete monitoring task bound to a table or field. Use this module to locate monitors, explain what they check, and understand how they are triggered.
Task 2.1: List Data Quality Monitors (ListDataQualityScans)
aliyun dataworks-public list-data-quality-scans --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> --page-number 1 --page-size 10 [--name <FUZZY_NAME>] [--table <FUZZY_TABLE_NAME>] [--sort-by "ModifyTime Desc"]
How to interpret the result:
PageInfo.DataQualityScans[]is the candidate monitor list; showId,Name,Description, owner, and latest update time- When
--Tableis used, explicitly tell the user these monitors are the likely matches for that table - Use the table to help the user choose one target monitor before moving to detail query
- When the list is empty, suggest checking
ProjectId,region, or relaxingName/Table
Task 2.2: Get Monitor Details (GetDataQualityScan)
aliyun dataworks-public get-data-quality-scan --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <SCAN_ID>
How to interpret the result:
Spec: summarize monitored object, rule count, core metrics, and threshold settingsTrigger: explain whether the monitor isByManualorByScheduleComputeResourceandRuntimeResource: mention them only when they help explain execution behaviorParametersandHooks: summarize only if they affect how the run is triggered or analyzed- Present a concise monitor summary first, then suggest alert-rule or run-history follow-up
Module 3: Alert Rules
Alert rules define when notifications are sent and to whom. Use this module when the user asks who gets notified, through which channel, and under what condition.
Receiver Type Quick Reference
| ReceiverType | Description |
|---|---|
| AliUid | Specific Alibaba Cloud account UID |
| DataQualityScanOwner | Owner of the data quality monitor task |
| TaskOwner | Owner of the associated scheduling task |
| DingdingUrl | DingTalk custom robot Webhook |
| FeishuUrl | Feishu custom robot Webhook |
| WeixinUrl | WeCom Webhook |
| WebhookUrl | Generic Webhook URL |
| ShiftSchedule | On-call schedule (notify by shift) |
Task 3.1: List Alert Rules (ListDataQualityAlertRules)
aliyun dataworks-public list-data-quality-alert-rules --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> --page-number 1 --page-size 10 [--data-quality-scan-id <SCAN_ID>] [--sort-by "CreateTime Desc"]
How to interpret the result:
PageInfo.DataQualityAlertRules[]should be summarized as: rule ID, condition, channels, receivers, and associated monitor IDs- Translate
Notification.Channelsinto user-friendly channel names such as DingTalk, email, Feishu, SMS, or Webhook - Summarize
Notification.Receiversby receiver type instead of showing nested raw JSON - If
DataQualityScanIdis provided, explicitly state these are the alert rules attached to that monitor
Task 3.2: Get Alert Rule Details (GetDataQualityAlertRule)
aliyun dataworks-public get-data-quality-alert-rule --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <ALERT_RULE_ID>
How to interpret the result:
- Explain the alert condition in plain language
- Summarize notification channels and recipients with emphasis on who will be notified and how
- Call out whether the rule targets one monitor or multiple monitors
- If the user is diagnosing missing alerts, suggest returning to recent run history for the associated monitor
Module 4: Scan Runs
A scan run is created every time a monitor executes. Use this module to inspect run history, diagnose failed checks, and read execution logs.
Status Quick Reference
| Status | Meaning | Recommended Path |
|---|---|---|
| Pass | All rules passed | No action needed |
| Fail | At least one rule failed to meet the threshold | GetDataQualityScanRun → Results → GetDataQualityScanRunLog |
| Error | Execution error (engine error, insufficient resources) | GetDataQualityScanRunLog to view error details |
| Warn | Warning triggered but did not reach the blocking threshold | GetDataQualityScanRun → Results to view metric values |
| Running | Execution in progress | Wait for completion before querying |
Task 4.1: List Scan Runs (ListDataQualityScanRuns)
aliyun dataworks-public list-data-quality-scan-runs --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> [--data-quality-scan-id <SCAN_ID>] [--status <Pass|Running|Error|Fail|Warn>] [--create-time-from <TIMESTAMP_MS>] [--create-time-to <TIMESTAMP_MS>] [--filter '{"TaskInstanceId":"<INSTANCE_ID>"}'] [--sort-by "CreateTime Desc"] [--page-number 1] [--page-size 20]
Filter quick reference:
| Scenario | Filter JSON Example |
|---|---|
| Filter by scheduling instance | {"TaskInstanceId":"123456"} |
| Filter by run number | {"RunNumber":"2"} |
How to interpret the result:
PageInfo.DataQualityScanRuns[]should be shown as a table withId,Status,CreateTime,FinishTime, and key runtime parameters- Sort by recent time first so phrases like "most recent" map naturally to the first row
- Highlight
Fail,Error, andWarn, then recommend drilling intoGetDataQualityScanRun - If the user asks for recent failures, combine
Status=Failwith a converted time range instead of asking for timestamps
Task 4.2: Get Scan Run Details (GetDataQualityScanRun)
aliyun dataworks-public get-data-quality-scan-run --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <SCAN_RUN_ID>
How to interpret the result:
Status: state clearly whether the run passed, failed, warned, errored, or is still runningResults: extract each rule's status, actual metric value, threshold, and whether it caused the overall failure; present this as a table instead of raw JSONScan: use it as configuration snapshot context only when it helps explain the failureParameters: mention runtime parameters when they may have influenced the result- If any rule is abnormal, proactively suggest
GetDataQualityScanRunLog
Task 4.3: Get Scan Run Log (GetDataQualityScanRunLog)
aliyun dataworks-public get-data-quality-scan-run-log --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <SCAN_RUN_ID> [--offset <BYTE_OFFSET>]
How to interpret the result:
Logis the raw execution trace; summarize the root cause first, then provide key excerpts if neededNextOffset = -1means log retrieval is complete- If
NextOffset != -1, continue querying with the returned offset until completion when the user asks for the full log - When logs are long, explain the main error path instead of pasting everything by default
Best Practices
- List before detail — Do not guess IDs. Use list queries first, then drill into a selected resource.
- Diagnose failures in order — For
Fail, checkGetDataQualityScanRunresults first, then readGetDataQualityScanRunLog. - Templates scope —
ListDataQualityTemplatesreturns workspace custom templates only;ProjectIdis required. Built-in templates must be viewed in the DataWorks console. - Use a bounded time window — For run-history queries, default to recent 24 hours or recent 10 rows to avoid oversized result sets.
- Proactively guide the next step — After every query, suggest the most likely follow-up instead of waiting for the user to ask.
- Expand
Specon demand —Specis often verbose. Summarize first, expand only on request.
Query Result Guidance
- Empty list result: Explain likely causes including wrong
ProjectId, wrongregion, or overly strict filters — suggest confirming parameters or relaxing filter conditions. - Spec field handling: First extract monitored object, rule count, key thresholds, and trigger mode; expand full JSON only when the user requests it.
- Abnormal status handling: When encountering
Fail/Error/Warn, do not just display the status — proactively provide the next diagnostic path. - Results field handling: Present status, actual value, threshold, and conclusion per rule in a table — do not dump the raw array.
Common Errors
| Error Code | Solution |
|---|---|
| Forbidden.Access / PermissionDenied | Check RAM permissions, see references/ram-policies.md |
| InvalidParameter | Verify parameter names, JSON shape, and required fields |
| EntityNotExists | Check whether the ID, ProjectId, and region match the target resource |
| InvalidPageSize | PageSize must be within the API-supported range, usually 1-100 |
Region and Endpoint
Common: cn-hangzhou, cn-shanghai, cn-beijing, cn-shenzhen.
Endpoint format: dataworks.<REGION_ID>.aliyuncs.com
Full region and endpoint list: references/related-apis.md
Reference Links
| Reference | Description |
|---|---|
| references/ram-policies.md | RAM permission configuration and policy examples |
| references/related-apis.md | API parameter details and Region Endpoints |
| references/cli-installation-guide.md | Aliyun CLI installation guide |
