Hologres Query Optimizer

Other

Hologres Query Execution Plan Analyzer and Optimizer. Use for analyzing SQL performance issues, understanding EXPLAIN/EXPLAIN ANALYZE output, interpreting query operators, and providing optimization recommendations for Hologres queries. Triggers: "hologres explain", "query plan", "execution plan", "sql optimization", "query performance", "hologres performance", "slow query", "query optimizer", "explain analyze"

Install

openclaw skills install hologres-query-optimizer

Prerequisites

This skill requires hologres-cli to be installed first:

pip install hologres-cli
export HOLOGRES_SKILL=hologres-query-optimizer

All SQL execution and GUC parameter operations depend on hologres-cli commands (hologres sql run, hologres guc show/set).

Hologres Query Execution Plan Analyzer

This skill helps analyze and optimize Hologres SQL query execution plans using EXPLAIN and EXPLAIN ANALYZE commands.

Version Note: This documentation is based on Hologres V1.3.4x+. Upgrade your instance for better execution plan readability.

Overview

CommandDescription
EXPLAIN <sql>Shows estimated execution plan from Query Optimizer (QO). Reference only.
EXPLAIN ANALYZE <sql>Shows actual execution plan with real runtime metrics. Use for optimization.

Quick Start

-- Estimated plan (no execution)
EXPLAIN SELECT * FROM my_table WHERE id > 100;

-- Actual plan with runtime metrics (executes query)
EXPLAIN ANALYZE SELECT * FROM my_table WHERE id > 100;

Reading EXPLAIN Output

Read execution plans bottom-up. Each arrow (->) represents a node/operator.

ParameterDescription
costEstimated cost: startup_cost..total_cost. Parent includes child costs.
rowsEstimated output rows. rows=1000 indicates missing statistics — run ANALYZE <table>.
widthEstimated average output width (bytes).

Reading EXPLAIN ANALYZE Output

EXPLAIN ANALYZE includes four sections: Query Plan, ADVICE, Cost, and Resource.

Query Plan Metrics

Format: [dop_in:dop_out id=X dop=N time=max/avg/min rows=total(max/avg/min) mem=max/avg/min open=X get_next=Y]

MetricDescription
dop_in:dop_outParallelism ratio (e.g., 21:1 for gather, 21:21 for shuffle)
dopActual parallelism degree (matches shard count)
timeTotal time = open + get_next (ms). Cumulative from children.
rowsOutput rows: total(max/avg/min). Large variance = data skew.
memMemory: max/avg/min
openInitialization time. Hash operators build tables here.
get_nextData fetch time. Called repeatedly until complete.

Important: time is cumulative. Current operator time = current time - child time.

ADVICE Section

System-generated suggestions:

  • Missing indexes: Table xxx misses bitmap index
  • Missing statistics: Table xxx Miss Stats! please run 'analyze xxx';
  • Data skew: shuffle data skew! max rows is X, min rows is Y

Cost Breakdown

MetricDescription
Total costQuery total time (ms)
Optimizer costQO plan generation time
Start query costPre-execution init (schema sync, locking)
Get the first block costTime to first record batch
Get result costTime to all results

Resource Consumption

Format: total(max_worker/avg_worker/min_worker)

MetricDescription
MemoryTotal and per-worker memory
CPU timeCumulative CPU time across cores
Physical read bytesDisk reads (cache miss)
Read bytesTotal reads (disk + cache)

Common Operators

For detailed operator reference, see references/operators.md.

Scan Operators

OperatorDescription
Seq ScanFull table scan
Index Scan using Clustering_indexColumn-store index scan
Index Seek (pk_index)Row-store primary key scan

Filter Operators

OperatorDescription
FilterNo index hit — add indexes
Segment FilterSegment key hit
Cluster FilterClustering key hit
Bitmap FilterBitmap index hit

Data Movement

OperatorDescription
Local GatherMerge files within shard
GatherMerge shards to final result
RedistributionData shuffle — check distribution_key
BroadcastSmall table broadcast to all shards

Join Operators

OperatorDescription
Hash JoinHash-based join (ensure small table is hash table)
Nested LoopNested loop join (avoid for large data)
Cross JoinOptimized non-equi join (V3.0+)

Aggregation

OperatorDescription
HashAggregateHash-based aggregation
Partial/Final HashAggregateMulti-stage aggregation

Other

OperatorDescription
SortORDER BY
LimitRow limit (check if pushed to scan)
ExecuteExternalSQLPQE execution — rewrite for HQE

Optimization Workflow

  1. Run EXPLAIN ANALYZE on slow query
  2. Check ADVICE section for immediate fixes
  3. Identify bottleneck operators (highest time)
  4. Apply targeted optimizations:
IssueSymptomSolution
Missing statsrows=1000ANALYZE <table>
Data shuffleRedistributionFix distribution_key
Wrong hash tableLarge table as hashUpdate statistics
No indexFilter onlyAdd clustering/bitmap index
PQE executionExecuteExternalSQLRewrite to HQE functions
Data skewLarge max/min varianceReview distribution

Key GUC Parameters

-- Multi-stage aggregation
SET optimizer_force_multistage_agg = on;

-- Join order control (for complex multi-table joins)
SET optimizer_join_order = 'query';  -- Follow SQL order
SET optimizer_join_order = 'greedy'; -- Greedy algorithm

-- Disable Cross Join
SET hg_experimental_enable_cross_join_rewrite = off;

To persist these settings at database level, use the CLI:

hologres guc set optimizer_force_multistage_agg on
hologres guc set optimizer_join_order query

Best Practices

  1. Always use EXPLAIN ANALYZE for production analysis
  2. Run ANALYZE after significant data changes
  3. Design distribution_key based on JOIN/GROUP BY patterns
  4. Set clustering_key for range query columns
  5. Use bitmap indexes for low-cardinality filters
  6. Ensure small table is hash table in joins
  7. Avoid non-equi joins when possible
  8. Rewrite PQE functions to HQE alternatives

Reference Links

ReferenceDescription
references/operators.mdDetailed operator descriptions
references/optimization-patterns.mdCommon optimization patterns
references/guc-parameters.mdQuery tuning parameters