databricks.labs.dqx.executor
DQCheckResult Objects
@dataclass(frozen=True)
class DQCheckResult()
Represents the result of applying a data quality rule to a DataFrame.
This class holds:
- The Spark Column containing the result of the data quality check (typically a struct with metadata and the outcome of the rule).
- The DataFrame that was evaluated or generated by the check.
DQRuleExecutor Objects
class DQRuleExecutor(abc.ABC)
Abstract base class for executing a data quality rule on a DataFrame.
The executor is responsible for:
- Applying the rule's logic to the provided DataFrame (and optional reference DataFrames).
- Returning the raw result of the check, without applying filters or attaching metadata.
Executors are specialized for different types of rules:
- DQRowRuleExecutor: Handles row-level rules (rules that produce a condition per row).
- DQDatasetRuleExecutor: Handles dataset-level rules (rules that may aggregate or join DataFrames).
Subclasses must implement the apply method, which performs the rule check and returns a DQCheckResult containing:
- The raw condition produced by the rule.
- The DataFrame used for reporting (original or transformed).
apply
@abc.abstractmethod
def apply(df: DataFrame,
spark: SparkSession,
ref_dfs: dict[str, DataFrame] | None = None) -> DQCheckResult
Apply a rule and return results
DQRowRuleExecutor Objects
class DQRowRuleExecutor(DQRuleExecutor)
Executor for row-level data quality rules.
This executor applies a DQRowRule to the provided DataFrame. Row-level rules generate a condition (Spark Column) that evaluates to a message string if the rule is violated, or null otherwise, for each row of the DataFrame.
Responsibilities:
- Obtain the condition column.
- Return a DQCheckResult containing:
- The condition column.
- The input DataFrame.
apply
def apply(df: DataFrame,
spark: SparkSession,
ref_dfs: dict[str, DataFrame] | None = None) -> DQCheckResult
Apply the row-level data quality rule to the provided DataFrame.
The rule produces a condition (Spark Column) that indicates whether each row satisfies or violates the check. The condition contains a message when the check fails, or null when it passes.
Arguments:
df
- The input DataFrame to which the rule is applied.spark
- The SparkSession used for executing the rule (unused for row rules).ref_dfs
- Optional dictionary of reference DataFrames (unused for row rules).
Returns:
DQCheckResult containing:
- condition: Spark Column representing the check condition.
- check_df: The input (main) DataFrame (used for downstream processing).
DQDatasetRuleExecutor Objects
class DQDatasetRuleExecutor(DQRuleExecutor)
Executor for dataset-level data quality rules.
This executor applies a DQDatasetRule to the provided DataFrame (and optional reference DataFrames). Dataset-level rules can produce conditions that involve multiple rows, aggregations, or comparisons across datasets.
Responsibilities:
- Obtain condition column and check function closure containing computation logic.
- Return a DQCheckResult containing:
- The condition column.
- The resulting DataFrame produced by the rule.
apply
def apply(df: DataFrame,
spark: SparkSession,
ref_dfs: dict[str, DataFrame] | None = None) -> DQCheckResult
Apply the dataset-level data quality rule to the provided DataFrame. This method executes the rules logic by executing check function closure. By convention the first argument to the closure is the DataFrame and the second argument is optional reference DataFrames.
The rule produces:
- A condition (Spark Column) that represents the overall dataset check status.
- A DataFrame with the results of the dataset-level evaluation.
Arguments:
df
- The input DataFrame to which the rule is applied.spark
- The SparkSession used for executing the rule.ref_dfs
- Optional dictionary of reference DataFrames for dataset-level checks.
Returns:
DQCheckResult containing:
- condition: Spark Column representing the check condition.
- check_df: DataFrame produced by the check, containing evaluation results.
DQRuleExecutorFactory Objects
class DQRuleExecutorFactory()
Factory for creating the appropriate DQRuleExecutor instance for a given DQRule.
This class encapsulates the logic for selecting the correct executor type (row-level or dataset-level) based on the rule instance provided.
Responsibilities:
- Determine the type of rule (DQRowRule or DQDatasetRule).
- Return the corresponding executor (DQRowRuleExecutor or DQDatasetRuleExecutor).
- Raise an error if the rule type is unsupported.