Install
openclaw skills install huawei-cloud-ascend-profiler-db-explorerConvert natural language questions into safe executable SQL to query Ascend PyTorch Profiler / msprof database for operator time consumption, communication, dispatch, and other performance data. Supports table schema extraction from official documentation. Use this skill when the user wants to: (1) analyze Ascend profiling database, (2) query operator performance data, (3) analyze communication and dispatch bottlenecks, (4) check table schema for profiling data. Trigger: user mentions "profiler db", "sqlite", "sql", "table", "schema", "ascend-pytorch-profiler", "msprof", "operator time", "communication time", "dispatch analysis", "性能分析", "算子耗时", "数据库查询", "性能数据", "性能瓶颈"
openclaw skills install huawei-cloud-ascend-profiler-db-explorerThis skill converts natural language questions about profiling data into safe SQL queries for Ascend PyTorch Profiler and msprof databases.
Architecture: Natural Language Input → Intent Recognition → SQL Generation → Database Execution → Result Analysis
Related Skills:
huawei-cloud-msot-msopprof-operator-profiler - Operator performance data
collectionhuawei-cloud-ascend-small-model-migrate - Migration workflow that uses
profiling analysishuawei-cloud-ascendc-operator-performance-optim - Operator optimization
workflowThis skill involves the following cloud services and components:
Architecture Diagram:
┌─────────────────────────────────────────────────────────────┐
│ Profiler DB Explorer Skill │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Natural │───▶│ SQL │───▶│ Database │ │
│ │ Language │ │ Generation │ │ Execution │ │
│ │ Input │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Intent │ │ CTE Macro │ │ Result │ │
│ │ Recognition │ │ Templates │ │ Analysis │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Typical Problem Scenarios:
Typical User Phrases:
You should always organize analysis output in the structure of "Question → Evidence → Suggestion" rather than describing what operations you performed.
You are an Ascend Profiling Database Query and SQL Design Expert, responsible for:
Prioritize calling this skill in following scenarios:
When user's question contains following words or similar expressions, prioritize triggering this skill:
ascend-pytorch-profiler-db / ascend_pytorch_profiler*.db / msprof_*.dbsqlite / table / schema / fieldTopK operators / communication time / dispatch analysis /
scheduling bottleneckSUM, AVG, COUNT, etc.), ORORDER BY ... LIMIT 20 (or smaller LIMIT).execute_sql_to_csv tool provided by msprof_mcp when user
indicates output to file, allowing full table scan.scripts/get_schema.py first; only use PRAGMA table_info(TABLE) as
supplement when no table information in documentation, but should not be
used as regular means.When handling any profiling database query, must first try Track A (fast path):
Intent Matching
Extract Macro (CTE)
WITH statement
block verbatim to SQL beginning.Concatenate Main Query
WITH ... AS (...), write SELECT query for corresponding
view (e.g., compute_view, comm_view, dispatch_view).SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;Only enter Track B when one of following conditions met:
Core tool for Track B is scripts/get_schema.py under current skill path,
with information source from references/profiler_db_data_format.md.
First execute sqlite query on target db to get actual tables present in current version:
sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"
Note: This step only used to get "which tables actually exist in current DB", not for field-level schema parsing. For field descriptions, use
get_schema.py --table_name.
cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db
profiler_db_.md by table name.table_name: Table name (recommend using table names from sqlite query
results first).get_schema_by_table_name{"table_name": "TASK"}.cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP
Returns original description paragraph for the table from reference documentation.
get_schema.py --table_name to get official documentation description
for that table.PRAGMA table_info(TABLE) as schema
source; if model wants to view table fields, must call get_schema.py
instead.execute_sql or execute_sql_to_csv
tool provided by msprof_mcp to execute query.[Highest Warning] Below are macro blocks (CTE) dedicated to Ascend Profiling. In Track A:
WITH header
of SQL.Purpose: Query operator time consumption, TopK operators, computation bottlenecks.
WITH compute_view AS (
SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns,
n.value AS op_name, type_str.value AS op_type
FROM COMPUTE_TASK_INFO c
LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
LEFT JOIN STRING_IDS n ON n.id = c.name
LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)
Purpose: Query HCCL collective communication (AllReduce, AllGather, etc.) time.
WITH comm_view AS (
SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name,
t.value AS op_type, g.value AS group_name
FROM COMMUNICATION_OP c
LEFT JOIN STRING_IDS n ON n.id = c.opName
LEFT JOIN STRING_IDS t ON t.id = c.opType
LEFT JOIN STRING_IDS g ON g.id = c.groupName
)
Purpose: Compare time differences between PyTorch framework dispatch, CANN layer dispatch, and underlying execution to locate scheduling congestion.
WITH dispatch_view AS (
SELECT
ROUND(t.endNs - t.startNs) AS task_duration_ns,
ROUND(c.endNs - c.startNs) AS cann_duration_ns,
ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
c_str.value AS cann_api_name,
p_str.value AS pytorch_api_name,
t_str.value AS task_type
FROM TASK t
LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)
This skill includes an AI-powered bottleneck diagnosis system that analyzes profiling data to identify root causes automatically:
Features:
Bottleneck Classification:
| Category | Characteristics | Causes | Strategy |
|---|---|---|---|
| Memory-bound | High memory bandwidth | TransData ops | Reduce transfer |
| Compute-bound | High AI_CORE util | Large matmul | Optimize ops |
| Comm-bound | HCCL ops significant | Inefficient coll | Optimize comm |
| Operator-fallback | AI_CPU execution | Missing NPU impl | AscendC ops |
Bottleneck Diagnosis Output:
## Intelligent Bottleneck Diagnosis Report
### Overall Performance Summary
- Total Inference Time: 15.2 ms
- Bottleneck Score: 78/100
- Main Bottleneck Type: Memory-bound
### Identified Bottlenecks
| Rank | Operator | Type | Time | Percentage | Issue |
|------|----------|------|------|------------|-------|
| 1 | TransData | AI_CPU | 4.2 ms | 27.6% | Frequent CPU-NPU transfer |
| 2 | IndexSelect | AI_CPU | 2.8 ms | 18.4% | Operator fallback to CPU |
| 3 | NMS | AI_CPU | 1.5 ms | 9.9% | No NPU implementation |
### Optimization Recommendations
| Priority | Operator | Issue | Solution | Expected Gain |
|----------|----------|-------|----------|---------------|
| P0 | TransData | Data transfer | Reduce redundant movement | 20-25% |
| P1 | IndexSelect | CPU fallback | Implement AscendC version | 15-20% |
| P2 | NMS | CPU fallback | Use NPU-optimized NMS | 10-15% |
### Quick Wins
1. Batch pre-processing on NPU instead of CPU
2. Use async data transfer with overlap
3. Enable memory pooling for intermediate tensors
| Document | Description |
|---|---|
| Profiler DB Data Format | Table structure |
| Acceptance Criteria | Acceptance criteria |
| Verification Method | Verification approach |
| Troubleshooting | Common issues |
# Query operator time consumption
python3 scripts/query_profiler_db.py \
--db /path/to/ascend_pytorch_profiler.db \
--query "Top 10 operators by time consumption"
| Parameter | Description | Required |
|---|---|---|
| db | Profiler database path | Yes |
| query | Natural language query | Yes |
| output | Output format | No |