Delta Lake
Delta Lake is an open-source storage framework that enables building a
Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.
Delta Lake is essentially a metadata layer on top of Parquet.
The file layout looks like:
![[Assets/delta-lake-file-format.png|500]]
Delta Lake Official Documentation
https://docs.delta.io/latest/index.html
- ACID transactions with optimistic concurrency control.
- Efficient streaming I/O.
- Caching.
- Time travel.
- Data layout optimization, e.g. Z-ordering.
- Schema enforcement & evolution.
- UPSERT & MERGE statements.
- Audit logging.
- Same Parquet disadvantages.
- Maintenance processes are required to maintain its performance, e.g.
OPTIMIZE
. - There is a learning curve when using advanced features, e.g.
VACUUM
.