Skip to main content

databricks.labs.dqx.checks_serializer

serialize_checks_from_dataframe

def serialize_checks_from_dataframe(df: DataFrame,
run_config_name: str = "default"
) -> list[dict]

Converts a list of quality checks defined in a DataFrame to a list of quality checks defined as Python dictionaries.

Arguments:

  • df - DataFrame with data quality check rules. Each row should define a check. Rows should have the following columns:
    • name - Name that will be given to a resulting column. Autogenerated if not provided
    • criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes)
    • check - DQX check function used in the check; A StructType column defining the data quality check
    • filter - Expression for filtering data quality checks
    • run_config_name (optional) - Run configuration name for storing checks across runs
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
  • run_config_name - Run configuration name for filtering quality rules

Returns:

List of data quality check specifications as a Python dictionary

deserialize_checks_to_dataframe

def deserialize_checks_to_dataframe(
spark: SparkSession,
checks: list[dict],
run_config_name: str = "default") -> DataFrame

Converts a list of quality checks defined as Python dictionaries to a DataFrame.

Arguments:

  • spark - Spark session.
  • checks - list of check specifications as Python dictionaries. Each check consists of the following fields:
    • check - Column expression to evaluate. This expression should return string value if it's evaluated to true (it will be used as an error/warning message) or null if it's evaluated to false
    • name - Name that will be given to a resulting column. Autogenerated if not provided
    • criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes)
    • filter (optional) - Expression for filtering data quality checks
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
  • run_config_name - Run configuration name for storing quality checks across runs

Returns:

DataFrame with data quality check rules

Raises:

  • InvalidCheckError - If any check is invalid or unsupported.

deserialize_checks

def deserialize_checks(
checks: list[dict],
custom_checks: dict[str, Callable] | None = None) -> list[DQRule]

Converts a list of quality checks defined as Python dictionaries to a list of DQRule objects.

Arguments:

  • checks - list of dictionaries describing checks. Each check is a dictionary consisting of following fields:
    • check - Column expression to evaluate. This expression should return string value if it's evaluated to true or null if it's evaluated to false
    • name - name that will be given to a resulting column. Autogenerated if not provided
    • criticality (optional) - possible values are error (data going only into "bad" dataframe), and warn (data is going into both dataframes)
    • filter (optional) - Expression for filtering data quality checks
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
  • custom_checks - dictionary with custom check functions (e.g. globals() of the calling module). If not specified, then only built-in functions are used for the checks.

Returns:

list of data quality check rules

Raises:

  • InvalidCheckError - If any dictionary is invalid or unsupported.

serialize_checks

def serialize_checks(checks: list[DQRule]) -> list[dict]

Converts a list of quality checks defined as DQRule objects to a list of quality checks defined as Python dictionaries.

Arguments:

  • checks - List of DQRule instances to convert.

Returns:

List of dictionaries representing the DQRule instances.

Raises:

  • InvalidCheckError - If any item in the list is not a DQRule instance.

serialize_checks_to_bytes

def serialize_checks_to_bytes(checks: list[dict], file_path: Path) -> bytes

Serializes a list of checks to bytes in json or yaml (default) format.

Arguments:

  • checks - List of checks to serialize.
  • file_path - Path to the file where the checks will be serialized.

Returns:

Serialized checks as bytes.

get_file_deserializer

def get_file_deserializer(filepath: str) -> Callable

Get the deserializer function based on file.

Arguments:

  • filepath - Path to the file.

Returns:

Deserializer function.