Skip to main content

databricks.labs.dqx.datacontract.contract_rules_generator

Data Contract to DQX Rules Generator.

This module provides functionality to generate DQX quality rules from data contract specifications like ODCS (Open Data Contract Standard).

DataContractRulesGenerator Objects

class DataContractRulesGenerator(DQEngineBase)

Generator for creating DQX quality rules from ODCS v3.x data contracts.

This class processes Open Data Contract Standard (ODCS) v3.x contracts natively, extracting constraints from logicalTypeOptions and generating DQX quality rules. Supports predefined rules from schema properties, explicit rules from quality sections, and text-based expectations processed via LLM.

__init__

def __init__(workspace_client: WorkspaceClient,
llm_engine: "DQLLMEngine | None" = None,
custom_check_functions: dict[str, Callable] | None = None)

Initialize the DataContractRulesGenerator.

Arguments:

  • workspace_client - Databricks WorkspaceClient instance.
  • llm_engine - Optional LLM engine for processing text-based quality expectations.
  • custom_check_functions - Optional dictionary of custom check functions.

Raises:

  • ImportError - If LLM dependencies are missing when llm_engine is provided.

generate_rules_from_contract

@telemetry_logger("datacontract", "generate_rules_from_contract")
def generate_rules_from_contract(
contract: DataContract | None = None,
contract_file: str | None = None,
contract_format: str = "odcs",
generate_predefined_rules: bool = True,
process_text_rules: bool = True,
default_criticality: str = "error") -> list[dict]

Generate DQX quality rules from an ODCS v3.x data contract.

Parses an ODCS v3.x contract natively and generates rules based on schema properties, logicalTypeOptions constraints, explicit quality definitions, and text-based expectations.

Arguments:

  • contract - Pre-loaded DataContract object from datacontract-cli. Can be created with:
    • DataContract(data_contract_file=path) - from a file path
    • DataContract(data_contract_str=yaml_string) - from a YAML/JSON string Either contract or contract_file must be provided.
  • contract_file - Path to contract YAML/JSON file (local, volume, or workspace). Either contract or contract_file must be provided.
  • contract_format - Contract format specification (default is "odcs"). Only "odcs" is supported.
  • generate_predefined_rules - Whether to generate rules from schema properties (default True). Set to False to only generate explicit rules.
  • process_text_rules - Whether to process text-based expectations using LLM (default True). Requires llm_engine to be provided in init.
  • default_criticality - Default criticality level for generated rules (default is "error").

Returns:

A list of dictionaries representing the generated DQX quality rules.

Raises:

  • contract0 - If neither or both contract parameters are provided, or format not supported.

Notes:

Exactly one of 'contract' or 'contract_file' must be provided.