Hologres Schema Generator

Other

Hologres DDL schema design and table creation expert. Use for generating CREATE TABLE statements, choosing storage formats (column/row/row-column), configuring table properties (distribution_key, clustering_key, bitmap_columns, event_time_column), designing partition tables, selecting data types, and optimizing table schemas for different query patterns. Triggers: "hologres建表", "hologres create table", "hologres schema", "hologres DDL", "distribution_key", "clustering_key", "bitmap_columns", "event_time_column", "列存", "行存", "行列共存", "分区表", "hologres table design", "表设计", "hologres数据类型"

Install

openclaw skills install hologres-schema-generator

Prerequisites

This skill requires hologres-cli to be installed first:

pip install hologres-cli
export HOLOGRES_SKILL=hologres-schema-generator

All SQL execution depends on hologres-cli commands (hologres sql run --write, hologres table create).

Hologres Schema Design & Table Creation

Generate optimized Hologres DDL statements with proper storage format, indexing, and distribution strategies.

Information Gathering (IMPORTANT)

Before generating DDL, check whether the user has provided enough context. Many Hologres table properties are immutable after creation (orientation, distribution_key, clustering_key, event_time_column, primary key, partition column) — getting them wrong means recreating the table.

Required Information

If ANY of the following are unclear, ask the user before generating DDL:

DimensionWhy CriticalExample Question
Query patternDetermines orientation (column/row/row-column)"This table is mainly used for OLAP analytics, KV point lookups, or both?"
Column definitionsCore schema structure"Can you list the main columns and their rough types (text, number, timestamp, etc.)?"
Primary key / unique keyDetermines UPSERT capability; immutable"Does this table need deduplication or updates by a unique key? If so, which column(s)?"
Main query conditionsDetermines distribution_key, clustering_key, bitmap"What columns are most often used in WHERE, JOIN, or GROUP BY?"

Conditionally Required

Ask these only when relevant signals appear:

SignalFollow-up Question
Table looks like time-series or log data"What is the estimated daily data volume? (determines whether to use partitioning)"
User mentions JOIN with other tables"Which column will this table JOIN on, and what is the distribution_key of the other table?"
Large data volume or mentions retention"How long should data be retained? (recommend dynamic partition management for lifecycle)"
User mentions both analytics and serving"What is the read pattern — batch scans, point lookups by ID, or both?"

Can Use Defaults (No Need to Ask)

PropertyDefault Behavior
bitmap_columnsAuto-enabled for TEXT columns; can ALTER later
dictionary_encoding_columnsUse :auto mode; can ALTER later
storage_modeDefault hot; can ALTER later
time_to_live_in_secondsNot recommended. Deletion time is non-deterministic. Use dynamic partition management instead

Gathering Flow

User request → Check available info
  |
  +-- Columns + query pattern + key info all clear?
  |     → Generate DDL directly
  |
  +-- Missing critical info?
  |     → Ask 1-3 focused questions (batch them, don't ask one at a time)
  |
  +-- Very vague request (e.g., "help me create a user table")?
        → Ask: 1) main columns  2) query pattern  3) key/dedup needs

Principle: Batch questions into a single round. Never ask more than 3-4 questions at once. If the user provides partial info, fill in reasonable defaults for the rest and explain your assumptions in comments.

Storage Formats

Choose the storage format based on your primary query pattern.

FormatSyntaxBest ForPrimary Key
Column storeorientation = 'column'OLAP, aggregation, scan-heavy queriesOptional
Row storeorientation = 'row'Point lookups, high-QPS KV queriesRequired
Row-column storeorientation = 'row,column'Mixed workloads (OLAP + point lookup)Required

Default recommendation: Use row,column (row-column store) when the workload is unclear. It handles both OLAP and point queries well.

Quick Start

-- Column store table (OLAP / analytics)
CREATE TABLE orders (
  order_id BIGINT NOT NULL,
  user_id BIGINT,
  amount DECIMAL(18,2),
  status TEXT,
  order_time TIMESTAMPTZ NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (order_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'order_id',
  clustering_key = 'order_time:asc',
  event_time_column = 'order_time',
  bitmap_columns = 'status,user_id'
);

-- Row store table (high-QPS point lookup)
CREATE TABLE user_profile (
  user_id BIGINT NOT NULL,
  name TEXT,
  email TEXT,
  tags JSONB,
  updated_at TIMESTAMPTZ,
  PRIMARY KEY (user_id)
)
WITH (
  orientation = 'row',
  distribution_key = 'user_id',
  clustering_key = 'user_id'
);

CREATE TABLE Syntax (V2.1+)

CREATE TABLE [IF NOT EXISTS] [schema.]table_name (
  column_name data_type [NOT NULL] [DEFAULT expr],
  ...
  [PRIMARY KEY (col1 [, col2, ...])]
)
[PARTITION BY LIST (partition_column)]
WITH (
  orientation = '{column | row | row,column}',
  distribution_key = 'col1[,col2]',
  clustering_key = 'col1[:asc|:desc][,col2[:asc|:desc]]',
  event_time_column = 'col',
  bitmap_columns = 'col1[,col2,...]',
  dictionary_encoding_columns = 'col1[:auto|:on|:off][,...]',
  time_to_live_in_seconds = 'N',          -- NOT recommended, see below
  storage_mode = '{hot | cold}',
  table_group = 'group_name'
);

Legacy Syntax (All Versions)

BEGIN;
CREATE TABLE table_name (...);
CALL set_table_property('table_name', 'orientation', 'column');
CALL set_table_property('table_name', 'distribution_key', 'col1');
CALL set_table_property('table_name', 'clustering_key', 'col1:asc');
CALL set_table_property('table_name', 'event_time_column', 'col1');
CALL set_table_property('table_name', 'bitmap_columns', 'col1,col2');
COMMIT;

Table Properties Reference

distribution_key (Distribution Key)

Controls how data is hash-distributed across shards. Critical for JOIN and GROUP BY performance.

RuleDescription
Choose high-cardinality columnsAvoid data skew
Use JOIN/GROUP BY columnsEnable local computation, avoid shuffle
Max 2 columnsMore columns reduce distribution effectiveness
Must be subset of PKIf table has a primary key
Immutable after creationCannot ALTER, must recreate table
-- Single column
distribution_key = 'user_id'

-- Two columns (join on both)
distribution_key = 'order_id,user_id'

clustering_key (Clustering Key / Sorted Index)

Physically sorts data within files. Accelerates range queries and filters.

RuleDescription
Use range-query columnse.g., timestamp, date columns
Max 2-3 columnsMore columns dilute sort benefit
Put high-selectivity column firstMost-filtered column goes first
Specify sort order:asc (default) or :desc
Column store onlyRow store uses PK as sort key
clustering_key = 'order_time:asc'
clustering_key = 'ds:asc,order_time:asc'

event_time_column (Segment Key)

Organizes data files by time ranges. Enables file-level pruning for time-range queries.

RuleDescription
Use time/date columnsTimestamp of data ingestion or event time
At most 1 columnOnly one segment key per table
Column store onlyNot applicable to row store
Combine with partitionPartition for coarse pruning, segment key for fine pruning
event_time_column = 'event_time'

bitmap_columns (Bitmap Index)

Builds bitmap indexes for fast equality filtering on low-to-medium cardinality columns.

RuleDescription
Use filter columnsColumns frequently in WHERE clause
Low-medium cardinalitystatus, type, region — NOT user_id
TEXT columns auto-enabledDefault bitmap for text columns
Can be added after creationALTER TABLE ... SET (bitmap_columns = ...)
bitmap_columns = 'status,payment_type,region'

dictionary_encoding_columns (Dictionary Encoding)

Compresses text columns by mapping values to integers. Speeds up GROUP BY and aggregations.

RuleDescription
Use :auto modeLet Hologres decide based on cardinality
Good for low-cardinality textcountry, status, category
Avoid high-cardinality columnsuser_id, order_id — no compression benefit
Don't set if unsureIncorrect setting may hurt performance
dictionary_encoding_columns = 'country:auto,status:auto'

time_to_live_in_seconds (TTL) — NOT RECOMMENDED

Do NOT use time_to_live_in_seconds. The actual deletion time is non-deterministic — data will be deleted at an arbitrary time after the specified TTL, not at a precise point. This makes it unreliable for data lifecycle management.

Recommended alternative: Use dynamic partition management — create daily/hourly partitions and drop old partitions on a schedule (via cron or scheduling system). This gives you precise, predictable data lifecycle control.

-- BAD: TTL-based lifecycle (deletion time unpredictable)
-- time_to_live_in_seconds = '2592000'

-- GOOD: Partition-based lifecycle (precise control)
-- 1. Create table with daily partitions
CREATE TABLE events (
  event_id BIGINT NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, event_id)
) PARTITION BY LIST (ds)
WITH (orientation = 'column', distribution_key = 'event_id');

-- 2. Drop old partitions on schedule (e.g., retain 30 days)
DROP TABLE IF EXISTS events_20251101;  -- drop partition older than 30 days

Primary Key Design

Storage FormatPK RequirementNotes
Column storeOptionalAdd PK only if needed for UPSERT
Row storeRequiredPK drives the row-store index
Row-column storeRequiredPK serves both point lookup and analytics

Rules:

  • Max 32 columns in composite PK
  • PK columns must be NOT NULL and UNIQUE
  • Prohibited types: FLOAT, DOUBLE, NUMERIC, ARRAY, JSON, JSONB
  • PK cannot be altered after creation — must recreate table
  • Avoid SERIAL as PK — causes table-level locks on write

Partition Table Design

Use partitions for large tables with time-based or categorical data.

-- Parent table
CREATE TABLE events (
  event_id BIGINT NOT NULL,
  user_id BIGINT,
  event_type TEXT,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, event_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'event_id',
  clustering_key = 'ds:asc',
  event_time_column = 'ds'
);

-- Child partitions
CREATE TABLE events_20251201 PARTITION OF events FOR VALUES IN ('20251201');
CREATE TABLE events_20251202 PARTITION OF events FOR VALUES IN ('20251202');

Rules:

  • Only LIST partitioning is supported
  • Partition column must be part of PK (if PK exists)
  • Supported partition column types: TEXT, VARCHAR, INT, DATE (V1.3.22+)
  • Skip daily partitions if daily data < 100M rows — use event_time_column instead
  • Always filter on partition column in queries for pruning

Scenario-Based Templates

1. High-QPS Point Lookup (KV)

CREATE TABLE user_kv (
  user_id BIGINT NOT NULL PRIMARY KEY,
  profile JSONB,
  updated_at TIMESTAMPTZ
)
WITH (
  orientation = 'row',
  distribution_key = 'user_id'
);

2. Prefix Range Scan

CREATE TABLE order_lines (
  order_id BIGINT NOT NULL,
  line_no INT NOT NULL,
  product_id BIGINT,
  quantity INT,
  PRIMARY KEY (order_id, line_no)
)
WITH (
  orientation = 'row',
  distribution_key = 'order_id',
  clustering_key = 'order_id'
);

3. Time-Range Analytics (Partitioned)

CREATE TABLE page_views (
  view_id BIGINT NOT NULL,
  user_id BIGINT,
  page_url TEXT,
  view_time TIMESTAMPTZ,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, view_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'view_id',
  clustering_key = 'view_time:asc',
  event_time_column = 'view_time',
  bitmap_columns = 'user_id,page_url'
);

4. Dimension Filtering (Non-Time)

CREATE TABLE product_stats (
  product_id BIGINT NOT NULL PRIMARY KEY,
  category TEXT,
  brand TEXT,
  sales_count BIGINT,
  revenue DECIMAL(18,2)
)
WITH (
  orientation = 'column',
  distribution_key = 'product_id',
  clustering_key = 'category:asc',
  bitmap_columns = 'category,brand'
);

5. JOIN-Optimized Fact + Dimension

-- Fact table: distribute by join key
CREATE TABLE fact_orders (
  order_id BIGINT NOT NULL PRIMARY KEY,
  customer_id BIGINT,
  amount DECIMAL(18,2),
  order_date TEXT
)
WITH (
  orientation = 'column',
  distribution_key = 'customer_id'
);

-- Dimension table: same distribution key
CREATE TABLE dim_customers (
  customer_id BIGINT NOT NULL PRIMARY KEY,
  name TEXT,
  region TEXT
)
WITH (
  orientation = 'row,column',
  distribution_key = 'customer_id'
);
-- JOIN on customer_id → local join, no shuffle

6. Mixed Workload (OLAP + Serving)

CREATE TABLE realtime_metrics (
  metric_id BIGINT NOT NULL,
  device_id BIGINT NOT NULL,
  value DOUBLE PRECISION,
  ts TIMESTAMPTZ NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, device_id, metric_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'row,column',
  distribution_key = 'device_id',
  clustering_key = 'ts:asc',
  event_time_column = 'ts',
  bitmap_columns = 'device_id'
);

References

DocumentContent
data-types.mdComplete data type reference
table-properties.mdDetailed table property guide with selection flowcharts
partition-guide.mdPartition table design and management

Best Practices

  1. Choose storage format first — column for OLAP, row for KV, row-column when unsure
  2. Set distribution_key to JOIN/GROUP BY columns — avoids cross-shard shuffle
  3. Set clustering_key for range-query columns — improves time-range and filter queries
  4. Set event_time_column for time-series data — enables file-level pruning
  5. Use bitmap_columns for low-cardinality filter columns — status, type, region
  6. Partition by date only if daily data > 100M rows — otherwise use segment key alone
  7. PK must include partition column — required by Hologres for partition tables
  8. Align distribution_key across JOINed tables — same column enables local join
  9. Use WITH syntax (V2.1+) — cleaner than CALL set_table_property in transactions
  10. Avoid SERIAL as primary key — causes table-level write locks