Skip to main content

databricks.labs.dqx.checks_serializer

ChecksNormalizer Objects

class ChecksNormalizer()

Handles normalization and denormalization of check dictionaries. E.g. responsible for converting Decimal values to/from serializable format.

normalize

@staticmethod
def normalize(checks: list[dict]) -> list[dict]

Recursively normalize checks dictionary to make it JSON/YAML serializable.

Arguments:

  • checks - List of check dictionaries that may contain non-serializable values.

Returns:

List of normalized check dictionaries.

denormalize_value

@staticmethod
def denormalize_value(val: Any) -> Any

Recursively convert special markers (e.g. Decimal) back to original objects.

denormalize

@staticmethod
def denormalize(checks: list[dict]) -> list[dict]

Recursively convert special markers back to objects after deserialization. Converts special markers (e.g., decimal format) back to Decimal objects.

Arguments:

  • checks - List of check dictionaries that may contain special markers.

Returns:

List of check dictionaries with special markers converted to objects.

FileFormatSerializer Objects

class FileFormatSerializer(ABC)

Abstract base class for file format serializers.

serialize

@abstractmethod
def serialize(data: list[dict]) -> str

Serialize data to string format.

deserialize

@abstractmethod
def deserialize(file_like: TextIO) -> list[dict]

Deserialize data from file-like object.

JsonSerializer Objects

class JsonSerializer(FileFormatSerializer)

JSON format serializer implementation.

serialize

def serialize(data: list[dict]) -> str

Serialize data to JSON string.

deserialize

def deserialize(file_like: TextIO) -> list[dict]

Deserialize data from JSON file.

YamlSerializer Objects

class YamlSerializer(FileFormatSerializer)

YAML format serializer implementation.

serialize

def serialize(data: list[dict]) -> str

Serialize data to YAML string.

deserialize

def deserialize(file_like: TextIO) -> list[dict]

Deserialize data from YAML file.

SerializerFactory Objects

class SerializerFactory()

Factory for creating appropriate serializers based on file extension.

get_supported_extensions

@classmethod
def get_supported_extensions(cls) -> tuple[str, ...]

Get tuple of supported file extensions.

Returns:

Tuple of supported file extensions (e.g., (".json", ".yaml", ".yml")).

create_serializer

@classmethod
def create_serializer(cls,
extension: str | None = None) -> FileFormatSerializer

Create a serializer based on file extension.

Arguments:

  • extension - File extension (e.g., ".json", ".yaml", ".yml"). If None or empty, defaults to YAML.

Returns:

Appropriate serializer instance. Defaults to YAML if extension not recognized or not provided.

register_format

@classmethod
def register_format(cls, extension: str,
serializer_class: type[FileFormatSerializer]) -> None

Register a new file format serializer.

Arguments:

  • extension - File extension
  • serializer_class - Serializer class implementing FileFormatSerializer interface.

ChecksSerializer Objects

class ChecksSerializer()

Handles serialization of DQRule objects to dictionaries and file formats.

serialize

@staticmethod
def serialize(checks: list[DQRule]) -> list[dict]

Converts a list of quality checks defined as DQRule objects to a list of quality checks defined as Python dictionaries.

Arguments:

  • checks - List of DQRule instances to convert.

Returns:

List of dictionaries representing the DQRule instances.

Raises:

  • InvalidCheckError - If any item in the list is not a DQRule instance.

serialize_to_bytes

@staticmethod
def serialize_to_bytes(checks: list[dict], extension: str) -> bytes

Serializes a list of checks to bytes in json or yaml (default) format.

Arguments:

  • checks - List of checks to serialize.
  • extension - File extension (e.g., ".json", ".yaml", ".yml").

Returns:

Serialized checks as bytes.

ChecksDeserializer Objects

class ChecksDeserializer()

Handles deserialization of dictionaries to DQRule objects and from file formats.

__init__

def __init__(custom_checks: dict[str, Callable] | None = None)

Initialize the deserializer.

Arguments:

  • custom_checks - Dictionary with custom check functions.

deserialize

def deserialize(checks: list[dict]) -> list[DQRule]

Converts a list of quality checks defined as Python dictionaries to a list of DQRule objects.

Arguments:

  • checks - list of dictionaries describing checks. Each check is a dictionary consisting of following fields:
    • check - Column expression to evaluate. This expression should return string value if it's evaluated to true or null if it's evaluated to false
    • name - name that will be given to a resulting column. Autogenerated if not provided
    • criticality (optional) - possible values are error (data going only into "bad" dataframe), and warn (data is going into both dataframes)
    • filter (optional) - Expression for filtering data quality checks
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.

Returns:

list of data quality check rules

Raises:

  • InvalidCheckError - If any dictionary is invalid or unsupported.

deserialize_from_file

@staticmethod
def deserialize_from_file(extension: str, file_like: TextIO) -> list[dict]

Deserialize checks from a file-like object based on file extension. Automatically denormalizes special markers back to objects.

Arguments:

  • extension - File extension (e.g., ".json", ".yaml", ".yml").
  • file_like - File-like object to read from.

Returns:

List of check dictionaries with special markers converted to objects.

DataFrameConverter Objects

class DataFrameConverter()

Handles conversion between DataFrames and check dictionaries.

from_dataframe

@staticmethod
def from_dataframe(df: DataFrame,
run_config_name: str = "default") -> list[dict]

Converts a list of quality checks defined in a DataFrame to a list of quality checks defined as Python dictionaries.

Arguments:

  • df - DataFrame with data quality check rules. Each row should define a check. Rows should have the following columns:
    • name - Name that will be given to a resulting column. Autogenerated if not provided.
    • criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes).
    • check - DQX check function used in the check; A StructType column defining the data quality check.
    • filter - Expression for filtering data quality checks.
    • run_config_name (optional) - Run configuration name for storing checks across runs.
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
  • run_config_name - Run configuration name for filtering quality rules, e.g. input table or job name (use "default" if not provided).

Returns:

List of data quality check specifications as a Python dictionary

to_dataframe

@staticmethod
def to_dataframe(spark: SparkSession,
checks: list[dict],
run_config_name: str = "default") -> DataFrame

Converts a list of quality checks defined as Python dictionaries to a DataFrame.

Arguments:

  • spark - Spark session.
  • checks - list of check specifications as Python dictionaries. Each check consists of the following fields:
    • check - Column expression to evaluate. This expression should return string value if it's evaluated to true (it will be used as an error/warning message) or null if it's evaluated to false
    • name - Name that will be given to a resulting column. Autogenerated if not provided
    • criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes)
    • filter (optional) - Expression for filtering data quality checks
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
  • run_config_name - Run configuration name for storing quality checks across runs, e.g. input table or job name (use "default" if not provided)

Returns:

DataFrame with data quality check rules

Raises:

  • InvalidCheckError - If any check is invalid or unsupported.

serialize_checks

def serialize_checks(checks: list[DQRule]) -> list[dict]

Converts a list of quality checks defined as DQRule objects to a list of quality checks defined as Python dictionaries.

This is a convenience user-friendly function that wraps ChecksSerializer.serialize.

Arguments:

  • checks - List of DQRule instances to convert.

Returns:

List of dictionaries representing the DQRule instances.

Raises:

  • InvalidCheckError - If any item in the list is not a DQRule instance.

deserialize_checks

def deserialize_checks(
checks: list[dict],
custom_checks: dict[str, Callable] | None = None) -> list[DQRule]

Converts a list of quality checks defined as Python dictionaries to a list of DQRule objects.

This is a convenience user-friendly function that wraps ChecksDeserializer.deserialize.

Arguments:

  • checks - list of dictionaries describing checks. Each check is a dictionary consisting of following fields:
    • check - Column expression to evaluate. This expression should return string value if it's evaluated to true or null if it's evaluated to false
    • name - name that will be given to a resulting column. Autogenerated if not provided
    • criticality (optional) - possible values are error (data going only into "bad" dataframe), and warn (data is going into both dataframes)
    • filter (optional) - Expression for filtering data quality checks
    • user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
  • custom_checks - Dictionary with custom check functions.

Returns:

list of data quality check rules

Raises:

  • InvalidCheckError - If any dictionary is invalid or unsupported.