Table Formats

Overview

Modern Table Formats bring ACID transactions, schema evolution, and time travel to data lakes, enabling lakehouse architectures.

Origin: Databricks (2019)

Storage: Parquet + transaction log

Strengths: Mature ecosystem, streaming

Best For: Spark-heavy workloads

Origin: Netflix → Apache (2018)

Storage: Any format + metadata tree

Strengths: Engine agnostic, hidden partitioning

Best For: Multi-engine environments

Origin: Uber → Apache (2016)

Storage: Base + log files

Strengths: Incremental processing, CDC

Best For: Real-time updates

Feature	Delta Lake	Apache Iceberg	Apache Hudi
ACID Transactions	✅ Full ACID	✅ Full ACID	✅ Full ACID
Time Travel	✅ Version & timestamp	✅ Snapshot-based	✅ Point-in-time
Schema Evolution	✅ Add/rename columns	✅ Full evolution	✅ Schema registry
Hidden Partitioning	❌ Manual partitioning	✅ Automatic	✅ Timeline-based
Streaming Support	✅ Native streaming	✅ Via Flink/Spark	✅ Real-time ingestion
Engine Support	Spark, Presto, Athena	Spark, Flink, Trino, Athena	Spark, Presto, Hive