DQX - Data Quality Framework
Provided by Databricks Labs
DQX is a data quality framework for Apache Spark that enables you to define, monitor, and react to data quality issues in your data pipelines.
Capabilities
Info of Failed Checks
Get detailed insights into why a check has failed.
Data Format Agnostic
Works seamlessly with Spark DataFrames.
Spark Batch & Streaming Support
Includes Delta Live Tables (DLT) integration.
Custom Reactions to Failed Checks
Drop, mark, or quarantine invalid data flexibly.
Check Levels
Use warning or error levels for failed checks.
Row & Column Level Rules
Define quality rules at both row and column levels.
Profiling & Rule Generation
Automatically profile and generate data quality rule candidates.
Code or Config Checks
Define checks as code or configuration.
Validation Summary & Dashboard
Track and identify data quality issues effectively.