Modern Table Formats bring ACID transactions, schema evolution, and time travel to data lakes, enabling lakehouse architectures.
Origin: Databricks (2019)
Storage: Parquet + transaction log
Strengths: Mature ecosystem, streaming
Best For: Spark-heavy workloads
Origin: Netflix β Apache (2018)
Storage: Any format + metadata tree
Strengths: Engine agnostic, hidden partitioning
Best For: Multi-engine environments
Origin: Uber β Apache (2016)
Storage: Base + log files
Strengths: Incremental processing, CDC
Best For: Real-time updates
Feature | Delta Lake | Apache Iceberg | Apache Hudi |
---|---|---|---|
ACID Transactions | β Full ACID | β Full ACID | β Full ACID |
Time Travel | β Version & timestamp | β Snapshot-based | β Point-in-time |
Schema Evolution | β Add/rename columns | β Full evolution | β Schema registry |
Hidden Partitioning | β Manual partitioning | β Automatic | β Timeline-based |
Streaming Support | β Native streaming | β Via Flink/Spark | β Real-time ingestion |
Engine Support | Spark, Presto, Athena | Spark, Flink, Trino, Athena | Spark, Presto, Hive |