databricks.labs.dqx.checks_serializer
serialize_checks_from_dataframe
def serialize_checks_from_dataframe(df: DataFrame,
run_config_name: str = "default"
) -> list[dict]
Converts a list of quality checks defined in a DataFrame to a list of quality checks defined as Python dictionaries.
Arguments:
df
- DataFrame with data quality check rules. Each row should define a check. Rows should have the following columns:- name - Name that will be given to a resulting column. Autogenerated if not provided
- criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes)
- check - DQX check function used in the check; A StructType column defining the data quality check
- filter - Expression for filtering data quality checks
- run_config_name (optional) - Run configuration name for storing checks across runs
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
run_config_name
- Run configuration name for filtering quality rules
Returns:
List of data quality check specifications as a Python dictionary
deserialize_checks_to_dataframe
def deserialize_checks_to_dataframe(
spark: SparkSession,
checks: list[dict],
run_config_name: str = "default") -> DataFrame
Converts a list of quality checks defined as Python dictionaries to a DataFrame.
Arguments:
spark
- Spark session.checks
- list of check specifications as Python dictionaries. Each check consists of the following fields:- check - Column expression to evaluate. This expression should return string value if it's evaluated to true (it will be used as an error/warning message) or null if it's evaluated to false
- name - Name that will be given to a resulting column. Autogenerated if not provided
- criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes)
- filter (optional) - Expression for filtering data quality checks
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
run_config_name
- Run configuration name for storing quality checks across runs
Returns:
DataFrame with data quality check rules
Raises:
InvalidCheckError
- If any check is invalid or unsupported.
deserialize_checks
def deserialize_checks(
checks: list[dict],
custom_checks: dict[str, Callable] | None = None) -> list[DQRule]
Converts a list of quality checks defined as Python dictionaries to a list of DQRule
objects.
Arguments:
checks
- list of dictionaries describing checks. Each check is a dictionary consisting of following fields:- check - Column expression to evaluate. This expression should return string value if it's evaluated to true or null if it's evaluated to false
- name - name that will be given to a resulting column. Autogenerated if not provided
- criticality (optional) - possible values are error (data going only into "bad" dataframe), and warn (data is going into both dataframes)
- filter (optional) - Expression for filtering data quality checks
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
custom_checks
- dictionary with custom check functions (e.g. globals() of the calling module). If not specified, then only built-in functions are used for the checks.
Returns:
list of data quality check rules
Raises:
InvalidCheckError
- If any dictionary is invalid or unsupported.
serialize_checks
def serialize_checks(checks: list[DQRule]) -> list[dict]
Converts a list of quality checks defined as DQRule objects to a list of quality checks defined as Python dictionaries.
Arguments:
checks
- List of DQRule instances to convert.
Returns:
List of dictionaries representing the DQRule instances.
Raises:
InvalidCheckError
- If any item in the list is not a DQRule instance.
serialize_checks_to_bytes
def serialize_checks_to_bytes(checks: list[dict], file_path: Path) -> bytes
Serializes a list of checks to bytes in json or yaml (default) format.
Arguments:
checks
- List of checks to serialize.file_path
- Path to the file where the checks will be serialized.
Returns:
Serialized checks as bytes.
get_file_deserializer
def get_file_deserializer(filepath: str) -> Callable
Get the deserializer function based on file.
Arguments:
filepath
- Path to the file.
Returns:
Deserializer function.