Apache Parquet
Apache Parquet is an open source data file format that was designed to improve performance when handling [[Column-oriented Database|column-oriented data]] in bulk. Apache Parquet is able to provide efficient compression and encoding schemes with enhanced performance due to its design. This makes it a common interchange format for both batch and interactive workloads, similar to other available columnar-storage file formats in Hadoop like RCFile and ORC.
Extension: .parquet
https://parquet.apache.org/docs/
- Reduces IO operations.
- Column-based format makes it more efficient in terms of storage space but also speeds up analytics queries.
- Highly efficient data compression and decompression.
- Support type-specific encoding.
- Supports several data types and nested data structures.
- Not human readable (binary).
- More memory required to read data vs row-based format.
- Can be slower to write than row-based file formats because of the metadata overhead.