Alibabacloud Dlf Manage

Query Catalog, database, and table metadata resources in Alibaba Cloud Data Lake Formation (DLF). Provides read-only queries via the DLF OpenAPI Python SDK, supporting listing and viewing Catalogs, databases, tables with their detailed information and Schema definitions. Use cases: "list available Catalogs", "list databases", "view table schema", "search tables", "search tables by name", "fuzzy search", "view DLF metadata", "what databases are in the data lake", "what columns does a table have", "find tables whose name contains xxx". This Skill only contains read-only operations — no create, modify, or delete operations.

Audits

Pass

Install

openclaw skills install alibabacloud-dlf-manage

DLF Data Lake Metadata Query

Query Catalog, Database, and Table metadata resources in Alibaba Cloud Data Lake Formation (DLF).

CRITICAL: Use only the Python SDK script provided by this Skill. All operations go through the DLF Python SDK (alibabacloud-dlfnext20250310) via scripts/dlf_metadata_query.py. This Skill does not invoke any shell-based command-line client and does not require AI-Mode configuration.

  • DO NOT attempt access via any shell-based command-line client — DLF is not exposed through one in this Skill
  • DO NOT use curl, wget, or other HTTP clients to call the DLF API directly
  • MUST use the scripts/dlf_metadata_query.py script provided by this Skill, which wraps the DLF Python SDK
  • All query operations are executed via python3 scripts/dlf_metadata_query.py <action> [options]

Architecture

Catalog (Data Catalog)
  └── Database
        └── Table
              ├── Schema (column definitions)
              ├── PartitionKeys (partition keys)
              ├── PrimaryKeys (primary keys)
              └── Options (table properties)

Installation

pip install -r requirements.txt

requirements.txt pins the full transitive dependency closure (including alibabacloud-dlfnext20250310==3.0.0) for reproducible installs.

Pre-check: Python SDK dependency

python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"

If not installed, run pip install -r requirements.txt.

Authentication

Pre-check: Alibaba Cloud Credentials Required

Use the default credential chain (CredentialClient) to obtain credentials automatically. Supported sources (in priority order):

  1. Environment variables (ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)
  2. Configuration file (~/.alibabacloud/credentials)
  3. ECS Instance RAM Role
  4. OIDC Role ARN

Security Rules:

  • NEVER read, echo, or print AK/SK values
  • NEVER ask the user to input AK/SK directly in the conversation or command line
  • NEVER explicitly handle or pass AK/SK in code — rely on the default credential chain

See https://help.aliyun.com/document_detail/378659.html for credential configuration details.

RAM Permissions

This Skill only involves read-only operations (List / Get). See references/ram-policies.md for the full permission list.

[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:

  1. Read references/ram-policies.md to get the full list of permissions required by this SKILL
  2. Pause and wait until the user confirms that the required permissions have been granted

Parameter Confirmation

IMPORTANT: Parameter Confirmation — Before invoking the API, the following user-specific parameters must be confirmed with the user; do not assume them. Region defaults to cn-hangzhou; if the user does not specify one, use the default without asking.

ParameterRequiredDescriptionDefault
regionNoRegion IDcn-hangzhou
catalog_nameConditionalCatalog name (--catalog, required for GetCatalog)-
catalog_idConditionalCatalog ID (--catalog-id, required when querying databases/tables, e.g. clg-paimon-xxxx)-
databaseConditionalDatabase name (--database)-
tableConditionalTable name (--table)-

Core Workflow

The script automatically reads AK/SK from environment variables and reports a clear error if they are missing. Region defaults to cn-hangzhou; use the default if the user does not specify one.

You MUST use scripts/dlf_metadata_query.py to query metadata. Do not use shell-based command-line clients or curl. Actions are in kebab-case.

CRITICAL — list vs. list-*-details: pick the lightest action that satisfies the request.

  • For listing names / IDs (including fuzzy search): use list-databases / list-tables. These call the ListDatabases / ListTables API.
  • For full attributes / Schema / properties: use list-database-details / list-table-details / get-database / get-table. These call the heavier *-details / Get* APIs.
  • Default to the lightweight list-* action unless the user explicitly asks for full configuration, Schema, or properties. Calling list-*-details when only names are needed is incorrect.

Query Operations

# ---- Catalog ----

# 1. List all Catalogs (names + minimal info — preferred for listing/searching)
python3 scripts/dlf_metadata_query.py list-catalogs

# 2. Fuzzy-search Catalogs by name (uses ListCatalogs)
python3 scripts/dlf_metadata_query.py list-catalogs --pattern test

# 3. Get Catalog details (by name) — use only when full Catalog config is needed
python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>

# 4. Get Catalog details (by ID) — use only when full Catalog config is needed
python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>

# ---- Database ----

# 5. List databases (NAMES only — DEFAULT for "list / show / which databases", calls ListDatabases)
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>

# 6. List database details (full attributes, calls ListDatabaseDetails) — use ONLY when the user asks for properties / configs / location / owner
python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>

# 7. Get a single database's details (calls GetDatabase) — use when the user asks for ONE specific database's full info
python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>

# ---- Table ----

# 8. List tables (NAMES only — DEFAULT for "list / show / which tables", calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>

# 9. Fuzzy-search tables by name (DEFAULT for "search / find tables matching ...", calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

# 10. List table details with Schema (calls ListTableDetails) — use ONLY when the user explicitly asks for Schema / columns / properties of all tables
python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>

# 11. Get a single table's details with Schema (calls GetTable) — use when the user asks for ONE specific table's Schema
python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>

Specify region (defaults to cn-hangzhou): add --region cn-shanghai

Typical Query Flow

1. list-catalogs          → get catalog_name and catalog_id (names only)
2. list-databases         → use catalog_id to view available database names
3. list-tables            → use catalog_id + database to view available table names
4. get-table              → use catalog_id + database + table to view ONE table's Schema

Only step 4 (get-table) is a "details" call, because Schema is what the user actually asked for. Steps 1–3 stay on the lightweight list-* actions.

Fuzzy Search

All list operations support the --pattern argument for fuzzy name matching, using % as the wildcard. Use the lightweight list-* action for pattern search unless the user explicitly asks for the full Schema / properties of every match.

# Search Catalogs whose name contains "test"
python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%

# Search databases whose name starts with "prod_"
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%

# Search tables whose name starts with "user" (DEFAULT — calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

Anti-pattern: do not use list-table-details --pattern ... to search by name. That calls ListTableDetails and is heavier than required. Reach for list-table-details only when the user has explicitly asked for the Schema / columns of every matching table.

Output Format

  • List operations: {"count": N, "items": [...]}
  • Get operations: a single JSON object
  • Errors: {"error": "...", "hint": "..."}

Verification

If list-catalogs returns the Catalog list, the connection and permissions are working:

python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou

See references/verification-method.md for detailed verification steps.

Best Practices

  1. Prefer the lightweight list-* action over list-*-details / get-*. When the task only requires listing resource names, IDs, or fuzzy matching, you MUST use list-catalogs / list-databases / list-tables (which call ListCatalogs / ListDatabases / ListTables). Only use list-*-details or get-* when the user explicitly asks for full configuration, Schema, columns, properties, owner, or location. Reaching for the heavier API when the lighter one suffices is incorrect.
  2. List before Get: use list-catalogs to obtain catalog_id first, then use catalog_id to query databases and tables.
  3. Use fuzzy search with the lightweight action: the --pattern argument supports fuzzy matching; use it on list-tables (not list-table-details) unless full Schema is also requested.
  4. Pagination: use --max-results and --page-token for paginated queries when there is a lot of data.
  5. Catalog ID vs Name: when querying Database/Table, use catalog_id (e.g. clg-paimon-xxxx), not the catalog name.

References

ReferenceDescription
references/related-apis.mdFull API list and parameter descriptions
references/ram-policies.mdRAM permission policy
references/acceptance-criteria.mdAcceptance criteria
references/verification-method.mdVerification method
DLF API overviewOfficial API documentation
DLF product documentationProduct documentation
Python SDK PyPISDK version info