Data lake and lakehouse platform patterns: ingestion/CDC, transformations, open table formats (Iceberg/Delta/Hudi), query and serving engines (Trino/ClickHouse/DuckDB), orchestration, governance/lineage, cost and operations. Self-hosted and cloud options.
name: data-lake-platform
description: "Data lake and lakehouse platform patterns: ingestion/CDC, transformations, open table formats (Iceberg/Delta/Hudi), query and serving engines (Trino/ClickHouse/DuckDB), orchestration, governance/lineage, cost and operations. Self-hosted and cloud options."
Data Lake Platform
Build and operate production data lakes and lakehouses: ingest, transform, store in open formats, and serve analytics reliably.
When to Use
Design data lake/lakehouse architecture
Set up ingestion pipelines (batch, incremental, CDC)
Build SQL transformation layers (SQLMesh, dbt)
Choose table formats and catalogs (Iceberg, Delta, Hudi)
Platform constraints: self-hosted vs cloud, preferred engines, team strengths?
Default Baseline (Good Starting Point)
Storage: object storage + open table format (usually Iceberg)
Catalog: REST/Hive/Glue/Nessie/Unity (match your platform)
Transforms: SQLMesh or dbt (pick one and standardize)
Lake query: Trino (or Spark for heavy compute/ML workloads)
Serving (optional): ClickHouse/StarRocks/Doris for low-latency BI
Governance: DataHub/OpenMetadata + OpenLineage
Orchestration: Dagster/Airflow/Prefect
Workflow
Pick table format + catalog: references/storage-formats.md (use assets/cross-platform/template-schema-evolution.md and assets/cross-platform/template-partitioning-strategy.md)
Design ingestion (batch/incremental/CDC): references/ingestion-patterns.md (use assets/cross-platform/template-ingestion-governance-checklist.md and assets/cross-platform/template-incremental-loading.md)
Design transformations (bronze/silver/gold or data products): references/transformation-patterns.md (use assets/cross-platform/template-data-pipeline.md)
Choose lake query vs serving engines: references/query-engine-patterns.md
Add governance, lineage, and quality gates: references/governance-catalog.md (use assets/cross-platform/template-data-quality-governance.md and assets/cross-platform/template-data-quality.md)
Plan operations + cost controls: references/operational-playbook.md and references/cost-optimization.md (use assets/cross-platform/template-data-quality-backfill-runbook.md and assets/cross-platform/template-cost-optimization.md)