databricks.labs.dqx.checks_serializer
ChecksNormalizer Objects
class ChecksNormalizer()
Handles normalization and denormalization of check dictionaries. E.g. responsible for converting Decimal values to/from serializable format.
normalize
@staticmethod
def normalize(checks: list[dict]) -> list[dict]
Recursively normalize checks dictionary to make it JSON/YAML serializable.
Arguments:
checks- List of check dictionaries that may contain non-serializable values.
Returns:
List of normalized check dictionaries.
denormalize_value
@staticmethod
def denormalize_value(val: Any) -> Any
Recursively convert special markers (e.g. Decimal) back to original objects.
denormalize
@staticmethod
def denormalize(checks: list[dict]) -> list[dict]
Recursively convert special markers back to objects after deserialization. Converts special markers (e.g., decimal format) back to Decimal objects.
Arguments:
checks- List of check dictionaries that may contain special markers.
Returns:
List of check dictionaries with special markers converted to objects.
FileFormatSerializer Objects
class FileFormatSerializer(ABC)
Abstract base class for file format serializers.
serialize
@abstractmethod
def serialize(data: list[dict]) -> str
Serialize data to string format.
deserialize
@abstractmethod
def deserialize(file_like: TextIO) -> list[dict]
Deserialize data from file-like object.
JsonSerializer Objects
class JsonSerializer(FileFormatSerializer)
JSON format serializer implementation.
serialize
def serialize(data: list[dict]) -> str
Serialize data to JSON string.
deserialize
def deserialize(file_like: TextIO) -> list[dict]
Deserialize data from JSON file.
YamlSerializer Objects
class YamlSerializer(FileFormatSerializer)
YAML format serializer implementation.
serialize
def serialize(data: list[dict]) -> str
Serialize data to YAML string.
deserialize
def deserialize(file_like: TextIO) -> list[dict]
Deserialize data from YAML file.
SerializerFactory Objects
class SerializerFactory()
Factory for creating appropriate serializers based on file extension.
get_supported_extensions
@classmethod
def get_supported_extensions(cls) -> tuple[str, ...]
Get tuple of supported file extensions.
Returns:
Tuple of supported file extensions (e.g., (".json", ".yaml", ".yml")).
create_serializer
@classmethod
def create_serializer(cls,
extension: str | None = None) -> FileFormatSerializer
Create a serializer based on file extension.
Arguments:
extension- File extension (e.g., ".json", ".yaml", ".yml"). If None or empty, defaults to YAML.
Returns:
Appropriate serializer instance. Defaults to YAML if extension not recognized or not provided.
register_format
@classmethod
def register_format(cls, extension: str,
serializer_class: type[FileFormatSerializer]) -> None
Register a new file format serializer.
Arguments:
extension- File extensionserializer_class- Serializer class implementing FileFormatSerializer interface.
ChecksSerializer Objects
class ChecksSerializer()
Handles serialization of DQRule objects to dictionaries and file formats.
serialize
@staticmethod
def serialize(checks: list[DQRule]) -> list[dict]
Converts a list of quality checks defined as DQRule objects to a list of quality checks defined as Python dictionaries.
Arguments:
checks- List of DQRule instances to convert.
Returns:
List of dictionaries representing the DQRule instances.
Raises:
InvalidCheckError- If any item in the list is not a DQRule instance.
serialize_to_bytes
@staticmethod
def serialize_to_bytes(checks: list[dict], extension: str) -> bytes
Serializes a list of checks to bytes in json or yaml (default) format.
Arguments:
checks- List of checks to serialize.extension- File extension (e.g., ".json", ".yaml", ".yml").
Returns:
Serialized checks as bytes.
ChecksDeserializer Objects
class ChecksDeserializer()
Handles deserialization of dictionaries to DQRule objects and from file formats.
__init__
def __init__(custom_checks: dict[str, Callable] | None = None)
Initialize the deserializer.
Arguments:
custom_checks- Dictionary with custom check functions.
deserialize
def deserialize(checks: list[dict]) -> list[DQRule]
Converts a list of quality checks defined as Python dictionaries to a list of DQRule objects.
Arguments:
checks- list of dictionaries describing checks. Each check is a dictionary consisting of following fields:- check - Column expression to evaluate. This expression should return string value if it's evaluated to true or null if it's evaluated to false
- name - name that will be given to a resulting column. Autogenerated if not provided
- criticality (optional) - possible values are error (data going only into "bad" dataframe), and warn (data is going into both dataframes)
- filter (optional) - Expression for filtering data quality checks
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
Returns:
list of data quality check rules
Raises:
InvalidCheckError- If any dictionary is invalid or unsupported.
deserialize_from_file
@staticmethod
def deserialize_from_file(extension: str, file_like: TextIO) -> list[dict]
Deserialize checks from a file-like object based on file extension. Automatically denormalizes special markers back to objects.
Arguments:
extension- File extension (e.g., ".json", ".yaml", ".yml").file_like- File-like object to read from.
Returns:
List of check dictionaries with special markers converted to objects.
DataFrameConverter Objects
class DataFrameConverter()
Handles conversion between DataFrames and check dictionaries.
from_dataframe
@staticmethod
def from_dataframe(df: DataFrame,
run_config_name: str = "default") -> list[dict]
Converts a list of quality checks defined in a DataFrame to a list of quality checks defined as Python dictionaries.
Arguments:
df- DataFrame with data quality check rules. Each row should define a check. Rows should have the following columns:- name - Name that will be given to a resulting column. Autogenerated if not provided.
- criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes).
- check - DQX check function used in the check; A StructType column defining the data quality check.
- filter - Expression for filtering data quality checks.
- run_config_name (optional) - Run configuration name for storing checks across runs.
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
run_config_name- Run configuration name for filtering quality rules, e.g. input table or job name (use "default" if not provided).
Returns:
List of data quality check specifications as a Python dictionary
to_dataframe
@staticmethod
def to_dataframe(spark: SparkSession,
checks: list[dict],
run_config_name: str = "default") -> DataFrame
Converts a list of quality checks defined as Python dictionaries to a DataFrame.
Arguments:
spark- Spark session.checks- list of check specifications as Python dictionaries. Each check consists of the following fields:- check - Column expression to evaluate. This expression should return string value if it's evaluated to true (it will be used as an error/warning message) or null if it's evaluated to false
- name - Name that will be given to a resulting column. Autogenerated if not provided
- criticality (optional) - Possible values are error (data going only into "bad" dataframe) and warn (data is going into both dataframes)
- filter (optional) - Expression for filtering data quality checks
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
run_config_name- Run configuration name for storing quality checks across runs, e.g. input table or job name (use "default" if not provided)
Returns:
DataFrame with data quality check rules
Raises:
InvalidCheckError- If any check is invalid or unsupported.
serialize_checks
def serialize_checks(checks: list[DQRule]) -> list[dict]
Converts a list of quality checks defined as DQRule objects to a list of quality checks defined as Python dictionaries.
This is a convenience user-friendly function that wraps ChecksSerializer.serialize.
Arguments:
checks- List of DQRule instances to convert.
Returns:
List of dictionaries representing the DQRule instances.
Raises:
InvalidCheckError- If any item in the list is not a DQRule instance.
deserialize_checks
def deserialize_checks(
checks: list[dict],
custom_checks: dict[str, Callable] | None = None) -> list[DQRule]
Converts a list of quality checks defined as Python dictionaries to a list of DQRule objects.
This is a convenience user-friendly function that wraps ChecksDeserializer.deserialize.
Arguments:
checks- list of dictionaries describing checks. Each check is a dictionary consisting of following fields:- check - Column expression to evaluate. This expression should return string value if it's evaluated to true or null if it's evaluated to false
- name - name that will be given to a resulting column. Autogenerated if not provided
- criticality (optional) - possible values are error (data going only into "bad" dataframe), and warn (data is going into both dataframes)
- filter (optional) - Expression for filtering data quality checks
- user_metadata (optional) - User-defined key-value pairs added to metadata generated by the check.
custom_checks- Dictionary with custom check functions.
Returns:
list of data quality check rules
Raises:
InvalidCheckError- If any dictionary is invalid or unsupported.