Install
openclaw skills install hologres-schema-generatorHologres DDL schema design and table creation expert. Use for generating CREATE TABLE statements, choosing storage formats (column/row/row-column), configuring table properties (distribution_key, clustering_key, bitmap_columns, event_time_column), designing partition tables, selecting data types, and optimizing table schemas for different query patterns. Triggers: "hologres建表", "hologres create table", "hologres schema", "hologres DDL", "distribution_key", "clustering_key", "bitmap_columns", "event_time_column", "列存", "行存", "行列共存", "分区表", "hologres table design", "表设计", "hologres数据类型"
openclaw skills install hologres-schema-generatorThis skill requires hologres-cli to be installed first:
pip install hologres-cli
export HOLOGRES_SKILL=hologres-schema-generator
All SQL execution depends on hologres-cli commands (hologres sql run --write, hologres table create).
Generate optimized Hologres DDL statements with proper storage format, indexing, and distribution strategies.
Before generating DDL, check whether the user has provided enough context. Many Hologres table properties are immutable after creation (orientation, distribution_key, clustering_key, event_time_column, primary key, partition column) — getting them wrong means recreating the table.
If ANY of the following are unclear, ask the user before generating DDL:
| Dimension | Why Critical | Example Question |
|---|---|---|
| Query pattern | Determines orientation (column/row/row-column) | "This table is mainly used for OLAP analytics, KV point lookups, or both?" |
| Column definitions | Core schema structure | "Can you list the main columns and their rough types (text, number, timestamp, etc.)?" |
| Primary key / unique key | Determines UPSERT capability; immutable | "Does this table need deduplication or updates by a unique key? If so, which column(s)?" |
| Main query conditions | Determines distribution_key, clustering_key, bitmap | "What columns are most often used in WHERE, JOIN, or GROUP BY?" |
Ask these only when relevant signals appear:
| Signal | Follow-up Question |
|---|---|
| Table looks like time-series or log data | "What is the estimated daily data volume? (determines whether to use partitioning)" |
| User mentions JOIN with other tables | "Which column will this table JOIN on, and what is the distribution_key of the other table?" |
| Large data volume or mentions retention | "How long should data be retained? (recommend dynamic partition management for lifecycle)" |
| User mentions both analytics and serving | "What is the read pattern — batch scans, point lookups by ID, or both?" |
| Property | Default Behavior |
|---|---|
bitmap_columns | Auto-enabled for TEXT columns; can ALTER later |
dictionary_encoding_columns | Use :auto mode; can ALTER later |
storage_mode | Default hot; can ALTER later |
time_to_live_in_seconds | Not recommended. Deletion time is non-deterministic. Use dynamic partition management instead |
User request → Check available info
|
+-- Columns + query pattern + key info all clear?
| → Generate DDL directly
|
+-- Missing critical info?
| → Ask 1-3 focused questions (batch them, don't ask one at a time)
|
+-- Very vague request (e.g., "help me create a user table")?
→ Ask: 1) main columns 2) query pattern 3) key/dedup needs
Principle: Batch questions into a single round. Never ask more than 3-4 questions at once. If the user provides partial info, fill in reasonable defaults for the rest and explain your assumptions in comments.
Choose the storage format based on your primary query pattern.
| Format | Syntax | Best For | Primary Key |
|---|---|---|---|
| Column store | orientation = 'column' | OLAP, aggregation, scan-heavy queries | Optional |
| Row store | orientation = 'row' | Point lookups, high-QPS KV queries | Required |
| Row-column store | orientation = 'row,column' | Mixed workloads (OLAP + point lookup) | Required |
Default recommendation: Use
row,column(row-column store) when the workload is unclear. It handles both OLAP and point queries well.
-- Column store table (OLAP / analytics)
CREATE TABLE orders (
order_id BIGINT NOT NULL,
user_id BIGINT,
amount DECIMAL(18,2),
status TEXT,
order_time TIMESTAMPTZ NOT NULL,
ds TEXT NOT NULL,
PRIMARY KEY (order_id)
)
PARTITION BY LIST (ds)
WITH (
orientation = 'column',
distribution_key = 'order_id',
clustering_key = 'order_time:asc',
event_time_column = 'order_time',
bitmap_columns = 'status,user_id'
);
-- Row store table (high-QPS point lookup)
CREATE TABLE user_profile (
user_id BIGINT NOT NULL,
name TEXT,
email TEXT,
tags JSONB,
updated_at TIMESTAMPTZ,
PRIMARY KEY (user_id)
)
WITH (
orientation = 'row',
distribution_key = 'user_id',
clustering_key = 'user_id'
);
CREATE TABLE [IF NOT EXISTS] [schema.]table_name (
column_name data_type [NOT NULL] [DEFAULT expr],
...
[PRIMARY KEY (col1 [, col2, ...])]
)
[PARTITION BY LIST (partition_column)]
WITH (
orientation = '{column | row | row,column}',
distribution_key = 'col1[,col2]',
clustering_key = 'col1[:asc|:desc][,col2[:asc|:desc]]',
event_time_column = 'col',
bitmap_columns = 'col1[,col2,...]',
dictionary_encoding_columns = 'col1[:auto|:on|:off][,...]',
time_to_live_in_seconds = 'N', -- NOT recommended, see below
storage_mode = '{hot | cold}',
table_group = 'group_name'
);
BEGIN;
CREATE TABLE table_name (...);
CALL set_table_property('table_name', 'orientation', 'column');
CALL set_table_property('table_name', 'distribution_key', 'col1');
CALL set_table_property('table_name', 'clustering_key', 'col1:asc');
CALL set_table_property('table_name', 'event_time_column', 'col1');
CALL set_table_property('table_name', 'bitmap_columns', 'col1,col2');
COMMIT;
Controls how data is hash-distributed across shards. Critical for JOIN and GROUP BY performance.
| Rule | Description |
|---|---|
| Choose high-cardinality columns | Avoid data skew |
| Use JOIN/GROUP BY columns | Enable local computation, avoid shuffle |
| Max 2 columns | More columns reduce distribution effectiveness |
| Must be subset of PK | If table has a primary key |
| Immutable after creation | Cannot ALTER, must recreate table |
-- Single column
distribution_key = 'user_id'
-- Two columns (join on both)
distribution_key = 'order_id,user_id'
Physically sorts data within files. Accelerates range queries and filters.
| Rule | Description |
|---|---|
| Use range-query columns | e.g., timestamp, date columns |
| Max 2-3 columns | More columns dilute sort benefit |
| Put high-selectivity column first | Most-filtered column goes first |
| Specify sort order | :asc (default) or :desc |
| Column store only | Row store uses PK as sort key |
clustering_key = 'order_time:asc'
clustering_key = 'ds:asc,order_time:asc'
Organizes data files by time ranges. Enables file-level pruning for time-range queries.
| Rule | Description |
|---|---|
| Use time/date columns | Timestamp of data ingestion or event time |
| At most 1 column | Only one segment key per table |
| Column store only | Not applicable to row store |
| Combine with partition | Partition for coarse pruning, segment key for fine pruning |
event_time_column = 'event_time'
Builds bitmap indexes for fast equality filtering on low-to-medium cardinality columns.
| Rule | Description |
|---|---|
| Use filter columns | Columns frequently in WHERE clause |
| Low-medium cardinality | status, type, region — NOT user_id |
| TEXT columns auto-enabled | Default bitmap for text columns |
| Can be added after creation | ALTER TABLE ... SET (bitmap_columns = ...) |
bitmap_columns = 'status,payment_type,region'
Compresses text columns by mapping values to integers. Speeds up GROUP BY and aggregations.
| Rule | Description |
|---|---|
Use :auto mode | Let Hologres decide based on cardinality |
| Good for low-cardinality text | country, status, category |
| Avoid high-cardinality columns | user_id, order_id — no compression benefit |
| Don't set if unsure | Incorrect setting may hurt performance |
dictionary_encoding_columns = 'country:auto,status:auto'
Do NOT use
time_to_live_in_seconds. The actual deletion time is non-deterministic — data will be deleted at an arbitrary time after the specified TTL, not at a precise point. This makes it unreliable for data lifecycle management.Recommended alternative: Use dynamic partition management — create daily/hourly partitions and drop old partitions on a schedule (via cron or scheduling system). This gives you precise, predictable data lifecycle control.
-- BAD: TTL-based lifecycle (deletion time unpredictable)
-- time_to_live_in_seconds = '2592000'
-- GOOD: Partition-based lifecycle (precise control)
-- 1. Create table with daily partitions
CREATE TABLE events (
event_id BIGINT NOT NULL,
ds TEXT NOT NULL,
PRIMARY KEY (ds, event_id)
) PARTITION BY LIST (ds)
WITH (orientation = 'column', distribution_key = 'event_id');
-- 2. Drop old partitions on schedule (e.g., retain 30 days)
DROP TABLE IF EXISTS events_20251101; -- drop partition older than 30 days
| Storage Format | PK Requirement | Notes |
|---|---|---|
| Column store | Optional | Add PK only if needed for UPSERT |
| Row store | Required | PK drives the row-store index |
| Row-column store | Required | PK serves both point lookup and analytics |
Rules:
FLOAT, DOUBLE, NUMERIC, ARRAY, JSON, JSONBSERIAL as PK — causes table-level locks on writeUse partitions for large tables with time-based or categorical data.
-- Parent table
CREATE TABLE events (
event_id BIGINT NOT NULL,
user_id BIGINT,
event_type TEXT,
ds TEXT NOT NULL,
PRIMARY KEY (ds, event_id)
)
PARTITION BY LIST (ds)
WITH (
orientation = 'column',
distribution_key = 'event_id',
clustering_key = 'ds:asc',
event_time_column = 'ds'
);
-- Child partitions
CREATE TABLE events_20251201 PARTITION OF events FOR VALUES IN ('20251201');
CREATE TABLE events_20251202 PARTITION OF events FOR VALUES IN ('20251202');
Rules:
LIST partitioning is supportedTEXT, VARCHAR, INT, DATE (V1.3.22+)event_time_column insteadCREATE TABLE user_kv (
user_id BIGINT NOT NULL PRIMARY KEY,
profile JSONB,
updated_at TIMESTAMPTZ
)
WITH (
orientation = 'row',
distribution_key = 'user_id'
);
CREATE TABLE order_lines (
order_id BIGINT NOT NULL,
line_no INT NOT NULL,
product_id BIGINT,
quantity INT,
PRIMARY KEY (order_id, line_no)
)
WITH (
orientation = 'row',
distribution_key = 'order_id',
clustering_key = 'order_id'
);
CREATE TABLE page_views (
view_id BIGINT NOT NULL,
user_id BIGINT,
page_url TEXT,
view_time TIMESTAMPTZ,
ds TEXT NOT NULL,
PRIMARY KEY (ds, view_id)
)
PARTITION BY LIST (ds)
WITH (
orientation = 'column',
distribution_key = 'view_id',
clustering_key = 'view_time:asc',
event_time_column = 'view_time',
bitmap_columns = 'user_id,page_url'
);
CREATE TABLE product_stats (
product_id BIGINT NOT NULL PRIMARY KEY,
category TEXT,
brand TEXT,
sales_count BIGINT,
revenue DECIMAL(18,2)
)
WITH (
orientation = 'column',
distribution_key = 'product_id',
clustering_key = 'category:asc',
bitmap_columns = 'category,brand'
);
-- Fact table: distribute by join key
CREATE TABLE fact_orders (
order_id BIGINT NOT NULL PRIMARY KEY,
customer_id BIGINT,
amount DECIMAL(18,2),
order_date TEXT
)
WITH (
orientation = 'column',
distribution_key = 'customer_id'
);
-- Dimension table: same distribution key
CREATE TABLE dim_customers (
customer_id BIGINT NOT NULL PRIMARY KEY,
name TEXT,
region TEXT
)
WITH (
orientation = 'row,column',
distribution_key = 'customer_id'
);
-- JOIN on customer_id → local join, no shuffle
CREATE TABLE realtime_metrics (
metric_id BIGINT NOT NULL,
device_id BIGINT NOT NULL,
value DOUBLE PRECISION,
ts TIMESTAMPTZ NOT NULL,
ds TEXT NOT NULL,
PRIMARY KEY (ds, device_id, metric_id)
)
PARTITION BY LIST (ds)
WITH (
orientation = 'row,column',
distribution_key = 'device_id',
clustering_key = 'ts:asc',
event_time_column = 'ts',
bitmap_columns = 'device_id'
);
| Document | Content |
|---|---|
| data-types.md | Complete data type reference |
| table-properties.md | Detailed table property guide with selection flowcharts |
| partition-guide.md | Partition table design and management |
distribution_key to JOIN/GROUP BY columns — avoids cross-shard shuffleclustering_key for range-query columns — improves time-range and filter queriesevent_time_column for time-series data — enables file-level pruningbitmap_columns for low-cardinality filter columns — status, type, regiondistribution_key across JOINed tables — same column enables local joinWITH syntax (V2.1+) — cleaner than CALL set_table_property in transactionsSERIAL as primary key — causes table-level write locks