Skip to main content

databricks.labs.dqx.rule

Criticality Objects

class Criticality(Enum)

Enum class to represent criticality of the check.

DefaultColumnNames Objects

class DefaultColumnNames(Enum)

Enum class to represent columns in the dataframe that will be used for error and warning reporting.

ColumnArguments Objects

class ColumnArguments(Enum)

Enum class that is used as input parsing for custom column naming.

SingleColumnMixin Objects

class SingleColumnMixin()

Mixin to handle column-related functionalities.

MultipleColumnsMixin Objects

class MultipleColumnsMixin()

Mixin to handle columns-related functionalities.

DQRule Objects

@dataclass(frozen=True)
class DQRule(abc.ABC, DQRuleTypeMixin, SingleColumnMixin,
MultipleColumnsMixin)

Represents a data quality rule that applies a quality check function to column(s) or column expression(s). This class includes the following attributes:

  • check_func - The function used to perform the quality check.
  • name (optional) - A custom name for the check; autogenerated if not provided.
  • criticality (optional) - Defines the severity level of the check:
    • error: Critical issues.
    • warn: Potential issues.
  • column (optional) - A single column to which the check function is applied.
  • columns (optional) - A list of columns to which the check function is applied.
  • filter (optional) - A filter expression to apply the check only to rows meeting specific conditions.
  • check_func_args (optional) - Positional arguments for the check function (excluding column).
  • check_func_kwargs (optional) - Keyword arguments for the check function (excluding column).
  • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.

columns

some checks require list of columns instead of column

get_check_condition

@abc.abstractmethod
def get_check_condition() -> Column

Compute the check condition for the rule.

Returns:

The Spark Column representing the check condition.

columns_as_string_expr

@ft.cached_property
def columns_as_string_expr() -> Column

Spark Column expression representing the column(s) as a string (not normalized).

Returns:

A Spark Column object representing the column(s) as a string (not normalized).

prepare_check_func_args_and_kwargs

def prepare_check_func_args_and_kwargs() -> tuple[list, dict]

Prepares positional arguments and keyword arguments for the check function. Includes only arguments supported by the check function and skips empty values.

to_dict

def to_dict() -> dict

Converts a DQRule instance into a structured dictionary.

DQRowRule Objects

@dataclass(frozen=True)
class DQRowRule(DQRule)

Represents a row-level data quality rule that applies a quality check function to a column or column expression. Works with check functions that take a single column or no column as input.

get_check_condition

def get_check_condition() -> Column

Compute the check condition for this rule.

Returns:

The Spark Column representing the check condition.

DQDatasetRule Objects

@dataclass(frozen=True)
class DQDatasetRule(DQRule)

Represents a dataset-level data quality rule that applies a quality check function to a column or column expression or list of columns depending on the check function. Either column or columns can be provided but not both. The rules are applied to the entire dataset or group of rows rather than individual rows. Failed checks are appended to the result columns in the same way as row-level rules.

get_check_condition

def get_check_condition() -> Column

Compute the check condition for this rule.

Returns:

The Spark Column representing the check condition.

DQForEachColRule Objects

@dataclass(frozen=True)
class DQForEachColRule(DQRuleTypeMixin)

Represents a data quality rule that applies to a quality check function repeatedly on each specified column of the provided list of columns. This class includes the following attributes:

  • columns - A list of column names or expressions to which the check function should be applied.
  • check_func - The function used to perform the quality check.
  • name (optional) - A custom name for the check; autogenerated if not provided.
  • criticality - The severity level of the check:
    • warn for potential issues.
    • error for critical issues.
  • filter (optional) - A filter expression to apply the check only to rows meeting specific conditions.
  • check_func_args (optional) - Positional arguments for the check function (excluding column names).
  • check_func_kwargs (optional) - Keyword arguments for the check function (excluding column names).
  • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.

get_rules

def get_rules() -> list[DQRule]

Build a list of rules for a set of columns.

Returns:

list of dq rules