Deequ
Deequ is a library built on top of Apache Spark for defining “unit tests for data”, which measure data quality in large datasets. Python users may also be interested in PyDeequ, a Python interface for Deequ. You can find PyDeequ on GitHub, readthedocs, and PyPI.
https://github.com/awslabs/deequ/tree/master
- Built on [[Apache Spark]] for large datasets
#placeholder/description