VeloDB / Apache Doris — Complete Best Practice Reference

All 37 rules, 7 use case templates, and sizing guides.

Use Case Templates

Use Case: CDC / Operational Data Sync

For replicating operational databases (MySQL, PostgreSQL) where rows are frequently updated or deleted.

Template

CREATE TABLE users_sync (
    user_id BIGINT NOT NULL,
    tenant_id INT NOT NULL,
    update_time DATETIME NOT NULL,
    name VARCHAR(100),
    email VARCHAR(200),
    status TINYINT
) ENGINE=OLAP
UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true",
    "function_column.sequence_col" = "update_time",
    "light_schema_change" = "true"
);

Why This Design

Decision	Choice	Why
Model	UNIQUE (MoW)	Enables UPDATE/DELETE. MoW gives fast reads without merge-sort.
Sequence col	`update_time`	Guarantees out-of-order CDC events don't overwrite newer data.
Bucket	HASH on `user_id` (= primary key)	Point lookups on PK are pruned to one tablet.
Partition	None	CDC tables are often small-to-medium; partition only if > 100GB.

Customization Points

Large CDC tables (> 100GB): Add PARTITION BY RANGE(update_time) () with dynamic partition
Composite primary key: Use UNIQUE KEY(tenant_id, user_id) for multi-tenant data
Partial column updates: Set "enable_unique_key_partial_update" = "true"

Use Case: Dashboard / Pre-Aggregated Metrics

For pre-computing dashboard numbers: daily sales, ad click totals, user activity summaries.

Template

CREATE TABLE daily_sales_metrics (
    dt DATE NOT NULL, store_id INT NOT NULL,
    total_revenue DECIMAL(15,2) SUM DEFAULT "0",
    max_transaction DECIMAL(15,2) MAX DEFAULT "0",
    order_count BIGINT SUM DEFAULT "0",
    unique_buyers BITMAP BITMAP_UNION
) ENGINE=OLAP AGGREGATE KEY(dt, store_id)
PARTITION BY RANGE(dt) ()
DISTRIBUTED BY HASH(store_id) BUCKETS AUTO
PROPERTIES ("dynamic_partition.enable"="true","dynamic_partition.time_unit"="MONTH",
    "dynamic_partition.start"="-12","dynamic_partition.end"="1","dynamic_partition.prefix"="p");

Why This Design

Decision	Choice	Why
Model	AGGREGATE	Values auto-aggregate (SUM, MAX, BITMAP_UNION) on ingestion.
BITMAP	`BITMAP_UNION` for unique_buyers	Exact count-distinct without storing raw user IDs.

Anti-Pattern

-- BAD: Using DUPLICATE for dashboard data you'll always aggregate
CREATE TABLE daily_sales_metrics (dt DATE, store_id INT, revenue DECIMAL(15,2))
DUPLICATE KEY(dt, store_id);
-- Every query must re-aggregate billions of rows at query time.

Use Case: Small Dimension / Lookup Table

For small reference tables (< 1 GB): products, stores, countries, config values.

Template

CREATE TABLE stores (
    store_id INT NOT NULL, region VARCHAR(20) NOT NULL,
    city VARCHAR(50), manager_name VARCHAR(100)
) ENGINE=OLAP DUPLICATE KEY(store_id)
DISTRIBUTED BY RANDOM BUCKETS 3
PROPERTIES ("replication_num" = "3");  -- Use "1" for cloud mode

Why This Design

Decision	Choice	Why
Model	DUPLICATE	Dimension data is reference data — rarely updated.
Partition	None	Table is tiny (< 1 GB). Partitioning adds overhead, not benefit.
Bucket	RANDOM, 3 buckets	Perfectly even distribution for small data.

Customization Points

Updated dimensions: Switch to UNIQUE KEY(store_id) with MoW if dimension data changes
Colocation JOINs: Switch to HASH(store_id) and match the fact table's bucket count

Use Case: Time-Series Logs & Events

For immutable, append-only data: application logs, click events, IoT sensor readings, audit trails.

Template

CREATE TABLE app_events (
    event_time DATETIME NOT NULL,
    app_id INT NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    user_id BIGINT,
    payload STRING
) ENGINE=OLAP
DUPLICATE KEY(event_time, app_id, event_type)
PARTITION BY RANGE(event_time) ()
DISTRIBUTED BY HASH(app_id) BUCKETS AUTO
PROPERTIES (
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-7",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "AUTO",
    "compression" = "zstd"
);

Why This Design

Decision	Choice	Why
Model	DUPLICATE	Logs are immutable — no updates/deletes needed. Fastest scan speed.
Partition	Dynamic RANGE by DAY	Auto-creates daily partitions, auto-drops old ones (TTL via `start`).
Bucket	HASH on `app_id`	High cardinality, commonly filtered in WHERE.
Sort Key	`(event_time, app_id, event_type)`	Time-first enables range scans; app_id enables prefix pruning.
Compression	ZSTD	Log data has high redundancy — ZSTD compresses 2-3x better than LZ4.

Customization Points

Retention: Change dynamic_partition.start (e.g., "-30" for 30 days)
Partition granularity: "HOUR" for very high volume or "MONTH" for low volume
Bucket key: If you mostly filter by user_id, use HASH(user_id) instead
Add text search: Add INDEX idx_payload(payload) USING INVERTED for log message search

Use Case: Observability Platform

For building a full observability stack: logs + metrics + traces.

Table 1: Logs (DUPLICATE)

CREATE TABLE otel_logs (
    log_time DATETIME NOT NULL, service_name VARCHAR(100) NOT NULL,
    severity VARCHAR(10) NOT NULL, trace_id VARCHAR(32), span_id VARCHAR(16),
    body TEXT, resource_attributes STRING,
    INDEX idx_body(body) USING INVERTED PROPERTIES("parser" = "unicode"),
    INDEX idx_trace(trace_id) USING BLOOM FILTER
) ENGINE=OLAP DUPLICATE KEY(log_time, service_name, severity)
PARTITION BY RANGE(log_time) ()
DISTRIBUTED BY HASH(service_name) BUCKETS AUTO
PROPERTIES ("dynamic_partition.enable"="true","dynamic_partition.time_unit"="DAY",
    "dynamic_partition.start"="-7","dynamic_partition.end"="3",
    "dynamic_partition.prefix"="p","compression"="zstd");

Table 2: Traces (DUPLICATE)

CREATE TABLE otel_traces (
    start_time DATETIME NOT NULL, service_name VARCHAR(100) NOT NULL,
    trace_id VARCHAR(32) NOT NULL, span_id VARCHAR(16) NOT NULL,
    parent_span_id VARCHAR(16), operation_name VARCHAR(200),
    duration_ms BIGINT, status_code TINYINT,
    INDEX idx_trace(trace_id) USING BLOOM FILTER
) ENGINE=OLAP DUPLICATE KEY(start_time, service_name, trace_id)
PARTITION BY RANGE(start_time) ()
DISTRIBUTED BY HASH(service_name) BUCKETS AUTO
PROPERTIES ("dynamic_partition.enable"="true","dynamic_partition.time_unit"="DAY",
    "dynamic_partition.start"="-7","dynamic_partition.end"="3",
    "dynamic_partition.prefix"="p","compression"="zstd");

Table 3: Metrics (AGGREGATE)

CREATE TABLE otel_metrics (
    metric_time DATETIME NOT NULL, service_name VARCHAR(100) NOT NULL,
    metric_name VARCHAR(200) NOT NULL,
    value DOUBLE SUM DEFAULT "0", count BIGINT SUM DEFAULT "0"
) ENGINE=OLAP AGGREGATE KEY(metric_time, service_name, metric_name)
PARTITION BY RANGE(metric_time) ()
DISTRIBUTED BY HASH(service_name) BUCKETS AUTO
PROPERTIES ("dynamic_partition.enable"="true","dynamic_partition.time_unit"="DAY",
    "dynamic_partition.start"="-30","dynamic_partition.end"="3","dynamic_partition.prefix"="p");

Design Principles

Shared bucket key: All tables use HASH(service_name) for potential colocation JOINs
Short retention for logs/traces: 7-day TTL; Longer for metrics: 30-day TTL
Text search on logs: Inverted index on body with unicode parser
Trace correlation: BloomFilter on trace_id for fast trace lookup

Use Case: User-Facing Point Query Analytics

For serving real-time analytics through APIs with low-latency, high-concurrency requirements.

Template

CREATE TABLE user_profiles (
    user_id BIGINT NOT NULL, tenant_id INT NOT NULL,
    name VARCHAR(100), email VARCHAR(200), last_login DATETIME,
    total_orders INT, lifetime_value DECIMAL(12,2),
    INDEX idx_tenant (tenant_id) USING BLOOM FILTER
) ENGINE=OLAP UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true",
    "store_row_column" = "true",
    "light_schema_change" = "true"
);

Why This Design

Decision	Choice	Why
Model	UNIQUE MoW	Fast reads (no merge-sort at query time)
store_row_column	`true`	Enables row-store mode for point queries — reads full row from one I/O
BloomFilter	On `tenant_id`	Skips tablets that don't contain the tenant

Optimized Point Query Pattern

SELECT * FROM user_profiles WHERE user_id = 12345;
-- With Prepared Statement for high concurrency:
PREPARE stmt FROM 'SELECT * FROM user_profiles WHERE user_id = ?';
SET @uid = 12345; EXECUTE stmt USING @uid;

Use Case: Star Schema / JOIN-Heavy Analytics

Uses colocation to ensure JOINs execute locally without network shuffle.

Fact Table

CREATE TABLE fact_orders (
    order_date DATE NOT NULL, order_id BIGINT NOT NULL,
    user_id INT NOT NULL, store_id INT NOT NULL, product_id INT NOT NULL,
    amount DECIMAL(12,2), quantity INT
) ENGINE=OLAP DUPLICATE KEY(order_date, order_id)
PARTITION BY RANGE(order_date) ()
DISTRIBUTED BY HASH(store_id) BUCKETS 16
PROPERTIES ("dynamic_partition.enable"="true","dynamic_partition.time_unit"="MONTH",
    "dynamic_partition.start"="-24","dynamic_partition.end"="1",
    "dynamic_partition.prefix"="p","dynamic_partition.buckets"="16",
    "colocate_with"="group_orders");

Dimension Table (colocated)

CREATE TABLE dim_stores (
    store_id INT NOT NULL, region VARCHAR(20), city VARCHAR(50), manager_name VARCHAR(100)
) ENGINE=OLAP DUPLICATE KEY(store_id)
DISTRIBUTED BY HASH(store_id) BUCKETS 16
PROPERTIES ("colocate_with" = "group_orders");

Colocation Rules — ALL must match:

Same colocate_with group name
Same bucket key column(s) and same column types
Same bucket count
Same replication_num

Data Model Rules

AGGREGATE Model Cannot UPDATE or DELETE

Impact: CRITICAL — AGG tables do not support UPDATE or DELETE statements.

The AGGREGATE model auto-merges rows with the same key using aggregation functions (SUM, MAX, REPLACE, etc). This means:

No UPDATE statement support
No DELETE statement support
No CDC pipeline compatibility (cannot reflect deletes)

Exception: REPLACE_IF_NOT_NULL — The one case where AGG can do partial updates. Columns with REPLACE_IF_NOT_NULL will only update when the incoming value is non-NULL. This allows partial column updates in AGG model, but still no DELETE.

Incorrect:

-- BAD: AGG for data that needs updates
CREATE TABLE orders (
    order_id BIGINT AGGREGATE KEY,
    status VARCHAR(20) REPLACE,
    amount DECIMAL(12,2) REPLACE
) AGGREGATE KEY(order_id);
-- Cannot DELETE cancelled orders!

Correct:

-- GOOD: UNIQUE MoW for updatable data
CREATE TABLE orders (
    order_id BIGINT NOT NULL,
    status VARCHAR(20),
    amount DECIMAL(12,2)
) UNIQUE KEY(order_id)
DISTRIBUTED BY HASH(order_id) BUCKETS AUTO
PROPERTIES ("enable_unique_key_merge_on_write" = "true");

Reference: Aggregate Model

Choose Data Model for Workload

Impact: CRITICAL — Data model cannot be changed after table creation.

Model	Use When	Key Trait
DUPLICATE	Append-only data (logs, events, clicks)	Fastest scan, keeps all rows
UNIQUE (MoW)	Rows are updated/deleted (CDC, user data)	Dedup on primary key, supports DELETE
AGGREGATE	Pre-aggregated metrics only (counters, sums)	Auto-aggregates on ingest

Decision rule: If you need UPDATE or DELETE → UNIQUE MoW. If append-only → DUPLICATE. Only use AGGREGATE when you will never query raw rows.

Incorrect:

-- BAD: Using AGGREGATE for a table that needs updates
CREATE TABLE users (
    user_id BIGINT AGGREGATE KEY,
    name VARCHAR(100) REPLACE,
    email VARCHAR(200) REPLACE
) AGGREGATE KEY(user_id);
-- Cannot DELETE rows. Cannot run UPDATE statements.

Correct:

-- GOOD: UNIQUE MoW for updatable data
CREATE TABLE users (
    user_id BIGINT NOT NULL,
    name VARCHAR(100),
    email VARCHAR(200)
) UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES ("enable_unique_key_merge_on_write" = "true");

Reference: Data Model Overview

Always Use Merge-on-Write (MoW) for UNIQUE Tables

Impact: CRITICAL — MoR requires runtime merge-sort, making reads 2-10× slower.

Since Doris 2.1, MoW is the default. For older versions, always set explicitly:

PROPERTIES ("enable_unique_key_merge_on_write" = "true")

MoW vs MoR:

Feature	MoW	MoR
Read speed	Fast (pre-merged)	Slow (merge at query)
Write speed	Slightly slower	Faster writes
DELETE support	Yes	Yes
Partial update	Yes	Limited

Always use MoW unless you have a write-heavy workload with minimal reads.

Reference: Unique Key Model

Sequence Column for Out-of-Order CDC

Impact: HIGH — CDC events may arrive out of order; older records can overwrite newer ones.

When using UNIQUE KEY with CDC (Flink CDC, Debezium, Canal), set a sequence column to guarantee ordering:

CREATE TABLE users (
    user_id BIGINT NOT NULL,
    update_time DATETIME NOT NULL,
    name VARCHAR(100),
    email VARCHAR(200)
) UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true",
    "function_column.sequence_col" = "update_time"
);

How it works: When two rows have the same primary key, Doris keeps the one with the higher sequence column value, regardless of arrival order.

When to use: Any CDC pipeline where events may arrive out of order (network delays, reprocessing, multiple sources).

Reference: Sequence Column

Partition Rules

AUTO PARTITION for Sporadic Data

Impact: HIGH — Avoids creating empty partitions for dates with no data.

Use AUTO PARTITION when data arrival is unpredictable (not every day/month has data):

CREATE TABLE sparse_events (
    event_time DATETIME NOT NULL,
    user_id BIGINT,
    event_type VARCHAR(50)
) DUPLICATE KEY(event_time, user_id)
AUTO PARTITION BY RANGE(date_trunc(event_time, 'month'))
()
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO;

When to use AUTO vs DYNAMIC:

Dynamic: Continuous data (logs, metrics) — pre-creates future partitions
Auto: Sporadic data (user uploads, batch jobs) — creates only when data arrives

Reference: Auto Partition

Dynamic Partition for Automated Data Lifecycle

Impact: HIGH — Automates partition creation and TTL-based data cleanup.

Key properties:

time_unit: DAY, WEEK, MONTH
start: Negative number = how many units to keep (TTL). E.g., "-7" keeps 7 days.
end: Positive number = how many future partitions to pre-create
buckets: AUTO or fixed number per partition

PROPERTIES (
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-7",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "AUTO"
);

Warning: Do not use dynamic partition for tables with < 20 million rows. For small tables, it creates wasteful empty partitions that go unnoticed and add unnecessary metadata overhead. Use a single partition or no partition instead.

Reference: Dynamic Partition

RANGE Partition on Time Column for Time-Series

Impact: CRITICAL — Partition key cannot be changed after table creation.

For any table with a time column (logs, events, metrics), use RANGE partition:

PARTITION BY RANGE(event_time) ()

Combined with dynamic partitioning, this enables:

Partition pruning: WHERE event_time > '2025-01-01' only scans relevant partitions
TTL: Old partitions are auto-dropped
Parallel loading: Each partition can be loaded independently

Incorrect:

-- BAD: No partition on a time-series table
CREATE TABLE events (
    event_time DATETIME, user_id BIGINT, data STRING
) DUPLICATE KEY(event_time)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO;
-- Every query scans ALL data, no TTL possible

Correct:

CREATE TABLE events (
    event_time DATETIME NOT NULL, user_id BIGINT, data STRING
) DUPLICATE KEY(event_time, user_id)
PARTITION BY RANGE(event_time) ()
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES (
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-30",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p"
);

Reference: Range Partition

Skip Partitioning for Small Tables (< 1 GB)

Impact: HIGH — Unnecessary partitioning on small tables wastes metadata and hurts performance.

If a table is under 1 GB total, do not partition it. Just use bucketing:

-- GOOD: Small dimension table, no partition needed
CREATE TABLE dim_countries (
    country_code VARCHAR(3) NOT NULL,
    country_name VARCHAR(100),
    region VARCHAR(50)
) DUPLICATE KEY(country_code)
DISTRIBUTED BY HASH(country_code) BUCKETS 3;

Rule of thumb:

< 1 GB → No partition
1-100 GB → Consider partition if time-series
100 GB → Always partition

Reference: Data Partitioning

Bucket Rules

Cloud Mode Requires HASH Bucketing for MoW

Impact: HIGH — UNIQUE MoW tables in cloud mode must use HASH bucketing.

In VeloDB Cloud (storage-compute separation), RANDOM bucketing is not supported for UNIQUE KEY tables with MoW enabled.

-- GOOD: Cloud MoW with HASH
CREATE TABLE users (...)
UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true",
    "replication_num" = "1"  -- cloud mode
);

Reference: VeloDB Cloud Documentation

Use Composite Bucket Key to Fix Data Skew

Impact: HIGH (Single skewed bucket key concentrates data; composite key distributes it evenly)

When a single column has uneven distribution (some values appear much more than others), use a composite bucket key.

Incorrect (single skewed key):

-- BAD: site_id 54321 has 80% of traffic → bucket_4 is 80% of data
CREATE TABLE site_access (
    site_id INT, city_code INT, pv BIGINT
) DUPLICATE KEY(site_id, city_code)
DISTRIBUTED BY HASH(site_id) BUCKETS 16;

Correct (composite key):

-- GOOD: combining site_id + city_code distributes the hot site across cities
CREATE TABLE site_access (
    site_id INT, city_code INT, pv BIGINT
) DUPLICATE KEY(site_id, city_code)
DISTRIBUTED BY HASH(site_id, city_code) BUCKETS 16;
-- Data for site_id=54321 is now spread across multiple buckets by city.

When to use composite keys:

One column dominates traffic (e.g., one large customer)
Single-column hash creates visible hot tablets
Detect with: SHOW TABLETS FROM table_name and check size distribution

Reference: Data Distribution

HASH Bucket for Pruning; RANDOM Only for DUP Full-Scans

Impact: CRITICAL — Bucket key cannot be changed after table creation.

HASH bucket: Filters on the bucket column prune to a single tablet → fast point queries
RANDOM bucket: Guarantees perfectly even distribution but no query pruning

Feature	HASH	RANDOM
Query pruning	Yes (WHERE on bucket col)	No
Data distribution	Depends on column cardinality	Always even
Supported models	ALL	DUPLICATE only

30% Skew Rule: If your chosen bucket column has >30% data skew (one value dominates), you must either use RANDOM bucketing or switch to a composite key. Detect with SHOW TABLETS FROM table_name and check size distribution.

Incorrect:

-- BAD: RANDOM on a UNIQUE table (not supported)
CREATE TABLE users (...) UNIQUE KEY(user_id)
DISTRIBUTED BY RANDOM BUCKETS AUTO;

Correct:

-- GOOD: HASH on primary key for UNIQUE
CREATE TABLE users (...) UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO;

-- GOOD: RANDOM for DUP when no filter column is obvious
CREATE TABLE raw_logs (...) DUPLICATE KEY(log_time)
DISTRIBUTED BY RANDOM BUCKETS AUTO;

Reference: Data Distribution

Choose High-Cardinality, Low-NULL Bucket Key

Impact: HIGH — Low-cardinality keys cause uneven data distribution.

Good bucket key properties:

High cardinality — many distinct values (user_id, order_id)
Low NULL rate — NULLs all hash to the same bucket
Used in WHERE/JOIN — enables partition pruning

Bad choices: status (3 values), gender (2 values), country (200 values for global data) Good choices: user_id, order_id, device_id, session_id

Reference: Data Distribution

Target 1-10 GB Per Tablet

Impact: HIGH — Tablet size directly affects query parallelism and metadata overhead.

Rules:

Target: 1-10 GB per tablet (compressed)
Max buckets per partition: ≤ 64
Preferred: Use BUCKETS AUTO to let Doris calculate

-- GOOD: Let Doris decide
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO

-- Manual: 100 GB partition / target 5 GB per tablet = 20 buckets
DISTRIBUTED BY HASH(user_id) BUCKETS 20

Warning signs:

Tablets < 100 MB → Too many buckets, merge or reduce
Tablets > 20 GB → Too few buckets, increase count

Reference: Data Distribution

Sort Key Rules

Avoid FLOAT/DOUBLE in Sort Key

Impact: HIGH — FLOAT/DOUBLE columns disable prefix index and ZoneMap pruning.

FLOAT and DOUBLE cannot participate in the prefix index
They also disable ZoneMap statistics for that column
Use DECIMAL instead for numeric precision

-- BAD: FLOAT in key breaks prefix index
DUPLICATE KEY(price, product_id)  -- price is FLOAT

-- GOOD: Use DECIMAL
CREATE TABLE products (
    product_id INT NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    name VARCHAR(200)
) DUPLICATE KEY(product_id, price);

Reference: Prefix Index

Cluster Key to Decouple Sort from Primary Key

Impact: HIGH — UNIQUE tables sort data by primary key, but queries may filter on other columns.

For UNIQUE MoW tables, the primary key determines dedup but the Cluster Key determines physical sort order:

CREATE TABLE users (
    user_id BIGINT NOT NULL,
    region VARCHAR(20),
    name VARCHAR(100)
) UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS AUTO
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true",
    "cluster_key" = "region, user_id"
);
-- Dedup by user_id, but data sorted by region first (better for region-filtered queries)

When to use: When your most common WHERE clause filters on columns that are NOT your primary key.

Reference: Cluster Key

Fixed-Length Types (INT/DATE) Before VARCHAR in Key

Impact: HIGH — A VARCHAR column terminates the prefix index.

Prefix index rules:

Max 3 columns OR 36 bytes (whichever comes first)
VARCHAR terminates the index — only the first 20 bytes of the VARCHAR are included, and no more columns after it

-- GOOD: Fixed-length types first
DUPLICATE KEY(user_id, event_date, event_type)
--            INT(4B)  DATE(4B)    VARCHAR(→ terminates)

-- BAD: VARCHAR first wastes the prefix index
DUPLICATE KEY(event_type, user_id, event_date)
--            VARCHAR(→ terminates after 20B, user_id and event_date NOT in index)

Reference: Prefix Index

Prefix Index Limits — 3 Columns, 36 Bytes

Impact: HIGH — Columns beyond the limit get no prefix index benefit.

Rule	Detail
Max columns	3
Max bytes	36
VARCHAR	Terminates the index (only first 20 bytes included)
FLOAT/DOUBLE	Cannot be in prefix index — breaks ZoneMap too

Example:

-- user_id(BIGINT=8B) + event_date(DATE=4B) + app_id(INT=4B) = 16B, 3 cols ✓
DUPLICATE KEY(user_id, event_date, app_id)

-- If you need a 4th filter column, add a separate index (BloomFilter, Inverted)

Reference: Prefix Index

High-Selectivity Columns First in Sort Key

Impact: CRITICAL — Sort key order determines query performance through prefix index.

Place columns in this order:

Most frequently filtered columns first
Higher selectivity (more distinct values) before lower
Equality filters before range filters

-- GOOD: user_id (high selectivity, common filter) first
DUPLICATE KEY(user_id, event_time, event_type)

-- BAD: event_type (low selectivity) first
DUPLICATE KEY(event_type, event_time, user_id)

The first 3 columns (or first 36 bytes) form the prefix index, which is the primary lookup structure in Doris.

Reference: Sort Key

Data Type Rules

BITMAP_UNION for Exact Count-Distinct and Funnel Analysis

Impact: HIGH — COUNT(DISTINCT col) on high-cardinality columns is extremely expensive. BITMAP gives exact results with O(1) merge.

CREATE TABLE daily_uv (
    dt DATE NOT NULL,
    page VARCHAR(200) NOT NULL,
    uv BITMAP BITMAP_UNION
) AGGREGATE KEY(dt, page)
DISTRIBUTED BY HASH(page) BUCKETS AUTO;
-- Insert with to_bitmap():
INSERT INTO daily_uv SELECT '2025-01-01', '/home', to_bitmap(user_id) FROM events;
-- Query exact UV:
SELECT dt, bitmap_count(uv) AS unique_visitors FROM daily_uv GROUP BY dt;

Funnel / Retention with bitmap_intersect():

SELECT bitmap_count(bitmap_intersect(uv)) AS retained_users
FROM daily_uv WHERE dt IN ('2025-01-01', '2025-01-02');

Orthogonal analysis: Use INTERSECT_COUNT for multi-dimensional user segmentation. Reference: BITMAP

Doris-Specific Type Gotchas

DATETIME precision: Default DATETIME is seconds. Use DATETIME(3) for milliseconds, DATETIME(6) for microseconds. STRING vs VARCHAR: STRING cannot be used as key, partition, or bucket column. Use VARCHAR for keyed columns. VARCHAR(65533) has identical performance to VARCHAR(255) — Doris uses variable-length storage, so there's no penalty for a larger max. When unsure, use VARCHAR(65533). CHAR vs VARCHAR: CHAR pads with spaces to fixed length. Only use CHAR for truly fixed-width codes (country_code CHAR(3)). Otherwise, prefer VARCHAR.

-- GOOD: Appropriate types
CREATE TABLE t (
    ts DATETIME(3),           -- millisecond precision
    country CHAR(3),          -- fixed 3-char code
    name VARCHAR(65533),      -- variable, no perf penalty
    big_text STRING            -- only for non-key, non-partitioned columns
);

Use Native Types, Not STRING for Everything

Impact: HIGH — STRING columns cannot be used as partition, bucket, or sort key columns. Always use the most specific type: INT for numbers, DATE/DATETIME for timestamps, DECIMAL for money.

-- BAD
CREATE TABLE t (id STRING, ts STRING, amount STRING);
-- GOOD
CREATE TABLE t (id BIGINT, ts DATETIME(3), amount DECIMAL(12,2));

Reference: Data Types

VARIANT Type for Semi-Structured JSON Data

Impact: HIGH — VARIANT provides columnar storage for JSON with automatic type inference. Use VARIANT instead of JSON or STRING for semi-structured data:

CREATE TABLE events (
    event_time DATETIME NOT NULL,
    event_id BIGINT NOT NULL,
    data VARIANT
) DUPLICATE KEY(event_time, event_id)
DISTRIBUTED BY HASH(event_id) BUCKETS AUTO;
-- Query nested fields directly:
SELECT data['user']['name'], data['action'] FROM events;

Schema Template: Pre-define expected fields for better columnar storage:

ALTER TABLE events SET ("variant_schema_template" = '{"user.name": "STRING", "action": "STRING", "amount": "DOUBLE"}');

Inverted Index on VARIANT fields:

ALTER TABLE events ADD INDEX idx_action(CAST(data['action'] AS VARCHAR)) USING INVERTED;

MATCH search on VARIANT text fields:

SELECT * FROM events WHERE CAST(data['message'] AS VARCHAR) MATCH 'error timeout';

Reference: VARIANT Type

JSON/ARRAY/Complex Types Disable ZoneMap

Impact: HIGH — ZoneMap statistics are not generated for JSON, ARRAY, MAP, STRUCT, or STRING columns. ZoneMap stores min/max values per data page, enabling skip scanning. Complex types disable this optimization. Workaround: Extract frequently filtered fields into dedicated columns with native types.

-- BAD: Filtering on JSON field — no ZoneMap, full scan
SELECT * FROM events WHERE payload->'$.status' = 'error';
-- GOOD: Extract to a native column
CREATE TABLE events (
    ..., status VARCHAR(20), payload JSON,
    INDEX idx_status(status) USING INVERTED
);
SELECT * FROM events WHERE status = 'error';

Table Properties

Properties Cloud Mode Forces

In VeloDB Cloud (storage-compute separation):

replication_num is forced to 1 (data stored in object storage)
HASH bucketing required for UNIQUE MoW
File cache controls performance (see schema-cache-file-cache)

PROPERTIES ("replication_num" = "1");

LZ4 Default vs ZSTD for Logs/Cold Data

Algorithm	Speed	Ratio	Use When
LZ4	Fastest	Lower	Default, hot data, low-latency
ZSTD	Slower	2-3× better	Logs, cold data, archival

-- For log/event tables with high redundancy:
PROPERTIES ("compression" = "zstd");
-- For hot analytical tables (default):
PROPERTIES ("compression" = "lz4");

Index Rules

Bitmap Index for Medium-Cardinality Dimensions (100-100K distinct values)

Impact: MEDIUM — Efficient for columns with moderate cardinality used in filters.

CREATE INDEX idx_city ON table_name(city) USING BITMAP;

Sweet spot: 100 to 100,000 distinct values (status codes, cities, categories). Restrictions:

Only one bitmap index can be created at a time (sequential schema change)
DROP INDEX is also a schema change and takes significant time
Only on value columns, not key columns
Segment V2 format required

BloomFilter for High-Cardinality Equality Filters

Impact: HIGH — Skips data pages that definitely don't contain the filtered value. Use for columns with ≥ 5000 distinct values, filtered with = or IN.

-- Add BloomFilter index
PROPERTIES ("bloom_filter_columns" = "trace_id, session_id");
-- Or per-column:
INDEX idx_trace(trace_id) USING BLOOM FILTER

Constraints:

NOT supported on TINYINT, FLOAT, or DOUBLE columns
Only accelerates = and IN filters (not LIKE, not range)
Minimum recommended cardinality: 5000+ distinct values
False positive rate ~1% (configurable via bloom_filter_fpp)

Inverted Index for Text Search and Range on Non-Key Columns

Impact: HIGH — Enables full-text search and efficient range filters without modifying the sort key.

-- Text search with parser
INDEX idx_body(body) USING INVERTED PROPERTIES("parser" = "unicode")
-- Equality/range on non-key column
INDEX idx_status(status) USING INVERTED

Parser options: none (exact), english, unicode (multilingual), chinese (CJK). Supported filter types: =, IN, >, <, >=, <=, MATCH_ALL, MATCH_ANY, MATCH_PHRASE. Reference: Inverted Index

NGram BloomFilter for LIKE '%pattern%' Queries

Impact: HIGH — LIKE '%pattern%' causes full table scan without NGram index.

INDEX idx_url(url) USING NGRAM_BF PROPERTIES("gram_size" = "3", "bf_size" = "1024")

The NGram BloomFilter breaks text into 3-character grams and uses a bloom filter to skip non-matching pages. When to use: LIKE '%keyword%' queries on VARCHAR/STRING columns. When NOT to use: Exact match (=) → use regular BloomFilter. Full-text search → use Inverted Index. Reference: NGram BloomFilter

Full-Text Search with MATCH Functions and BM25 Scoring

Impact: HIGH — Enables search-engine-like text queries with relevance ranking. 7 MATCH operators: MATCH_ALL, MATCH_ANY, MATCH_PHRASE, MATCH_PHRASE_PREFIX, MATCH_PHRASE_EDGE, MATCH_REGEXP, MATCH_ELEMENT_EQ.

-- Setup: Inverted index with parser
INDEX idx_content(content) USING INVERTED PROPERTIES(
    "parser" = "unicode", "support_phrase" = "true"
)
-- Queries (WHERE clause only):
WHERE content MATCH_ALL 'database analytics'    -- all terms
WHERE content MATCH_ANY 'database analytics'    -- any term
WHERE content MATCH_PHRASE 'real time analytics' -- exact phrase

SEARCH() unified DSL: Combine operators in one function:

WHERE SEARCH(content, '"real time" +analytics -legacy', 'parser=unicode')

BM25 scoring: Rank results by relevance:

SELECT doc_id, score() AS relevance FROM docs
WHERE content MATCH_ANY 'doris analytics' ORDER BY relevance DESC;

Custom analyzers, hybrid text+vector search, and VARIANT text search are all supported. Reference: Full-Text Search

HNSW/IVF Vector Index for ANN Search

Impact: HIGH — Enables approximate nearest neighbor search on embeddings. Requires Doris 4.0+ or VeloDB with vector support.

CREATE TABLE embeddings (
    doc_id BIGINT NOT NULL,
    content VARCHAR(65533),
    embedding ARRAY<FLOAT> NOT NULL
) DUPLICATE KEY(doc_id)
DISTRIBUTED BY HASH(doc_id) BUCKETS AUTO;
-- Add HNSW index:
CREATE INDEX idx_vec ON embeddings(embedding) USING HNSW
PROPERTIES("dim" = "768", "metric" = "cosine", "m" = "16", "ef_construction" = "200");
-- Query:
SELECT doc_id, l2_distance(embedding, [0.1, 0.2, ...]) AS dist
FROM embeddings ORDER BY dist LIMIT 10;

Index types: HNSW (fast, memory-heavy), IVF_FLAT (balanced), IVF_PQ (compressed).

Materialized Views

Async MV for Multi-Table JOIN Acceleration

Impact: HIGH — Pre-computes JOINs so queries read a flat table instead of joining at runtime.

CREATE MATERIALIZED VIEW mv_order_details
REFRESH SCHEDULE EVERY 10 MINUTES
AS SELECT o.order_id, o.amount, p.product_name, c.customer_name
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id;

Refresh modes: SCHEDULE (periodic), ON COMMIT (on base table change, limit: ≤5 updates/hr), MANUAL. Reference: Async Materialized View

Async MV Operational Limits

Limit	Value
Max rows per MV	~50 million
Max JOINs	2
Max partitions	30
Max concurrent refreshes	3
Cluster resource cap	40%
ON COMMIT limit	≤ 5 updates/hour
Capacity estimation: ~20-30 active async MVs on a 3-node cluster.
Layered design pattern: Build MVs on top of other MVs (Layer 1: base aggregations, Layer 2: cross-table joins).
partition_sync_limit: Focus refresh on recent data only:

CREATE MATERIALIZED VIEW mv_recent
PROPERTIES ("partition_sync_limit" = "7")
REFRESH SCHEDULE EVERY 1 HOUR
AS SELECT ... FROM orders ...;

Sync MV (Rollup) for Single-Table Aggregation

Impact: HIGH — Pre-aggregates data; optimizer rewrites queries automatically.

CREATE MATERIALIZED VIEW mv_daily_sales AS
SELECT dt, store_id, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY dt, store_id;

Restriction: NOT supported on UNIQUE KEY tables. Use async MVs instead. Sync MVs are maintained synchronously with the base table — zero lag. Reference: Sync Materialized View

Caching Rules

File Cache Strategies (Cloud / Storage-Compute Separation)

Key metric: 60% of data cached locally = 95% cache hit rate. Maintain 90%+ hit rate. Cache modes:

LRU (default): Evicts least recently used. Good for uniform access.
TTL: Time-based eviction. Good for time-series with clear hot/cold boundary. Table-level cache control:

-- Keep dimension tables cached forever
ALTER TABLE dim_stores SET ("file_cache_ttl_seconds" = "0");  -- 0 = never evict
-- Hot window for fact tables
ALTER TABLE fact_orders SET ("file_cache_ttl_seconds" = "604800");  -- 7 days

IOPS guidance: SSD cache: ~500 IOPS/disk. HDD: ~200 IOPS/disk.

Query Cache and Partition Cache

Query cache: Identical SQL → instant response (no computation).

SET enable_query_cache = true;

Partition cache: Only recomputes partitions with new data. Ideal for time-series dashboards where most partitions are historical and unchanged.

SET enable_partition_cache = true;

When to use:

Query cache: Repeated identical queries (dashboard auto-refresh)
Partition cache: Time-series with mostly historical data

Sizing Guides

BE Sizing — Cloud / Storage-Compute Separation

In cloud mode, data is in object storage (S3/GCS). BEs only cache hot data locally. CPU: Same as integrated — determines query parallelism. Memory: 32-128 GB. Used for query execution, not data storage. Local disk: SSD for file cache. Size based on hot data ratio (60% cached = 95% hit rate). Elasticity: Can scale BE nodes independently from data volume. Add BEs for more compute, not more storage.

BE Sizing — Integrated Storage Mode

In integrated mode, each BE stores data locally on disk. CPU: 16-64 cores per BE. More cores = more concurrent query/scan threads. Memory: 64-256 GB per BE. Rule of thumb: 4 GB RAM per 1 TB stored data. Disk: SSD recommended. 1-10 TB per BE. Use RAID or multiple disks for throughput. Node count: Start with 3 nodes for HA. Scale horizontally for more throughput.

FE Node Sizing

Cluster Size	FE Nodes	Memory	CPU
Small (< 10 BE)	1 Leader + 2 Follower	16 GB	8 cores
Medium (10-50 BE)	1 Leader + 2 Follower	32 GB	16 cores
Large (50+ BE)	1 Leader + 4 Follower	64 GB	32 cores
FE nodes store metadata (table schemas, partitions, tablets) in memory. Scale FE memory with table count and partition count rather than data volume.

Storage Calculation Formula

Raw data × compression ratio × replication_num = Required storage

Compression ratios (typical):

Data Type	LZ4 Ratio	ZSTD Ratio
Structured (numeric)	3-5×	5-8×
Logs/text	5-10×	10-20×
JSON/semi-structured	3-8×	5-12×
Example: 1 TB/day raw logs × 30 days retention × ZSTD (10× compression) × 1 replica = 3 TB storage
Include overhead: Add 20-30% for metadata, compaction temp space, and safety margin.

Getting Started

Getting Started — VeloDB Cloud

Connection Info

You'll need: Host, Port (usually 9030 for MySQL protocol), User, Password, Warehouse name.

Connect via MySQL Client

mysql -h <host> -P 9030 -u <user> -p<password>

Connect via JDBC

jdbc:mysql://<host>:9030/<database>?useSSL=false

First Steps

Create a database: CREATE DATABASE IF NOT EXISTS my_db;
Use the database: USE my_db;
Create your first table (see use case templates for guidance)
Load data via Stream Load, INSERT, or external connectors

Cloud-Specific Properties

Always set these for cloud mode:

PROPERTIES ("replication_num" = "1");

Reference: VeloDB Cloud Docs

Getting Started — Self-Hosted / BYOC / On-Prem

Prerequisites

FE nodes: Java 8+ runtime
BE nodes: Linux with sufficient disk, memory
Network: FE and BE nodes must be able to communicate

Deployment Steps

Deploy FE nodes (1 Leader + 2 Followers for HA)
Deploy BE nodes (3+ for production)
Register BEs with FE: ALTER SYSTEM ADD BACKEND "<be_host>:9050";
Create database and tables

Connect

mysql -h <fe_host> -P 9030 -u root

Self-Hosted Properties

PROPERTIES ("replication_num" = "3");  -- 3 replicas for HA

Reference: Apache Doris Installation

ナビゲーション

Skillsとは？

リンク

VeloDB / Apache Doris — Complete Best Practice Reference

VeloDB / Apache Doris — Complete Best Practice Reference

Use Case Templates

Use Case: CDC / Operational Data Sync

Template

Why This Design

Customization Points

Use Case: Dashboard / Pre-Aggregated Metrics

Template

Why This Design

Anti-Pattern

Use Case: Small Dimension / Lookup Table

Template

Why This Design

Customization Points

Use Case: Time-Series Logs & Events

Template

Why This Design

Customization Points

Use Case: Observability Platform

Table 1: Logs (DUPLICATE)

Table 2: Traces (DUPLICATE)

Table 3: Metrics (AGGREGATE)

Design Principles

Use Case: User-Facing Point Query Analytics

Template

Why This Design

Optimized Point Query Pattern

Use Case: Star Schema / JOIN-Heavy Analytics

Fact Table

Dimension Table (colocated)

Colocation Rules — ALL must match:

Data Model Rules

AGGREGATE Model Cannot UPDATE or DELETE

Choose Data Model for Workload

Always Use Merge-on-Write (MoW) for UNIQUE Tables

Sequence Column for Out-of-Order CDC

Partition Rules

AUTO PARTITION for Sporadic Data

Dynamic Partition for Automated Data Lifecycle

RANGE Partition on Time Column for Time-Series

Skip Partitioning for Small Tables (< 1 GB)

Bucket Rules

Cloud Mode Requires HASH Bucketing for MoW

Use Composite Bucket Key to Fix Data Skew

HASH Bucket for Pruning; RANDOM Only for DUP Full-Scans

Choose High-Cardinality, Low-NULL Bucket Key

Target 1-10 GB Per Tablet

Sort Key Rules

Avoid FLOAT/DOUBLE in Sort Key

Cluster Key to Decouple Sort from Primary Key

Fixed-Length Types (INT/DATE) Before VARCHAR in Key

Prefix Index Limits — 3 Columns, 36 Bytes

High-Selectivity Columns First in Sort Key

Data Type Rules

BITMAP_UNION for Exact Count-Distinct and Funnel Analysis

Doris-Specific Type Gotchas

Use Native Types, Not STRING for Everything

VARIANT Type for Semi-Structured JSON Data

JSON/ARRAY/Complex Types Disable ZoneMap

Table Properties

Properties Cloud Mode Forces

LZ4 Default vs ZSTD for Logs/Cold Data

Index Rules

Bitmap Index for Medium-Cardinality Dimensions (100-100K distinct values)

BloomFilter for High-Cardinality Equality Filters

Inverted Index for Text Search and Range on Non-Key Columns

NGram BloomFilter for LIKE '%pattern%' Queries

Full-Text Search with MATCH Functions and BM25 Scoring

HNSW/IVF Vector Index for ANN Search

Materialized Views

Async MV for Multi-Table JOIN Acceleration

Async MV Operational Limits

Sync MV (Rollup) for Single-Table Aggregation

Caching Rules

File Cache Strategies (Cloud / Storage-Compute Separation)

Query Cache and Partition Cache