Data Engineering Hub
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode

Deequ

Deequ is a library built on top of Apache Spark for defining “unit tests for data”, which measure data quality in large datasets. Python users may also be interested in PyDeequ, a Python interface for Deequ. You can find PyDeequ on GitHub, readthedocs, and PyPI.

Deequ Official Documentation

https://github.com/awslabs/deequ/tree/master

Deequ Advantages

  • Built on [[Apache Spark]] for large datasets

Deequ Disadvantages

#placeholder/description