Skip to main content

databricks.labs.dqx.config

InputConfig Objects

@dataclass
class InputConfig()

Configuration class for input data sources (e.g. tables or files).

OutputConfig Objects

@dataclass
class OutputConfig()

Configuration class for output data sinks (e.g. tables or files).

__post_init__

def __post_init__()

Normalize trigger configuration by converting string boolean representations to actual booleans. This is required due to the limitation of the config deserializer.

ProfilerConfig Objects

@dataclass
class ProfilerConfig()

Configuration class for profiler.

summary_stats_file

file containing profile summary statistics

sample_fraction

fraction of data to sample (30%)

sample_seed

seed for sampling

limit

limit the number of records to profile

filter

filter to apply to the data before profiling

RunConfig Objects

@dataclass
class RunConfig()

Configuration class for the data quality checks

name

name of the run configuration

quarantine_config

quarantined data table

metrics_config

summary metrics table

checks_user_requirements

user input for AI-assisted rule generation

warehouse_id

warehouse id to use in the dashboard

reference_tables

reference tables to use in the checks

LLMModelConfig Objects

@dataclass
class LLMModelConfig()

Configuration for LLM model

api_key

when used with Profiler Workflow, this should be a secret: secret_scope/secret_key

api_base

when used with Profiler Workflow, this should be a secret: secret_scope/secret_key

LLMConfig Objects

@dataclass(frozen=True)
class LLMConfig()

Configuration for LLM usage

ExtraParams Objects

@dataclass(frozen=True)
class ExtraParams()

Class to represent extra parameters for DQEngine.

WorkspaceConfig Objects

@dataclass
class WorkspaceConfig()

Configuration class for the workspace

extra_params

extra parameters to pass to the jobs, e.g. result_column_names

profiler_max_parallelism

max parallelism for profiling multiple tables

quality_checker_max_parallelism

max parallelism for quality checking multiple tables

custom_metrics

custom summary metrics tracked by the observer when applying checks

as_dict

def as_dict() -> dict

Convert the WorkspaceConfig to a dictionary for serialization. This method ensures that all fields, including boolean False values, are properly serialized. Used by blueprint's installation when saving the config (Installation.save()).

Returns:

A dictionary representation of the WorkspaceConfig.

get_run_config

def get_run_config(run_config_name: str | None = "default") -> RunConfig

Get the run configuration for a given run name, or the default configuration if no run name is provided.

Arguments:

  • run_config_name - The name of the run configuration to get, e.g. input table or job name (use "default" if not provided).

Returns:

The run configuration.

Raises:

  • InvalidConfigError - If no run configurations are available or if the specified run configuration name is not found.

BaseChecksStorageConfig Objects

@dataclass
class BaseChecksStorageConfig(abc.ABC)

Marker base class for storage configuration.

Arguments:

  • location - The file path or table name where checks are stored.

FileChecksStorageConfig Objects

@dataclass
class FileChecksStorageConfig(BaseChecksStorageConfig)

Configuration class for storing checks in a file.

Arguments:

  • location - The file path where the checks are stored.

WorkspaceFileChecksStorageConfig Objects

@dataclass
class WorkspaceFileChecksStorageConfig(BaseChecksStorageConfig)

Configuration class for storing checks in a workspace file.

Arguments:

  • location - The workspace file path where the checks are stored.

TableChecksStorageConfig Objects

@dataclass
class TableChecksStorageConfig(BaseChecksStorageConfig)

Configuration class for storing checks in a table.

Arguments:

  • location - The table name where the checks are stored.
  • run_config_name - The name of the run configuration to use for checks, e.g. input table or job name (use "default" if not provided).
  • mode - The mode for writing checks to a table (e.g., 'append' or 'overwrite'). The overwrite mode will only replace checks for the specific run config and not all checks in the table.

run_config_name

to filter checks by run config

LakebaseChecksStorageConfig Objects

@dataclass
class LakebaseChecksStorageConfig(BaseChecksStorageConfig)

Configuration class for storing checks in a Lakebase table.

Arguments:

  • instance_name - Name of the Lakebase instance.
  • user - Name of the user for the Lakebase connection.
  • location - Fully qualified name of the Lakebase table to store checks in the format 'database.schema.table'.
  • port - The Lakebase port (default is '5432').
  • run_config_name - Name of the run configuration to use for checks (default is 'default').
  • mode - The mode for writing checks to a table (e.g., 'append' or 'overwrite'). The overwrite mode only replaces checks for the specific run config and not all checks in the table (default is 'overwrite').

VolumeFileChecksStorageConfig Objects

@dataclass
class VolumeFileChecksStorageConfig(BaseChecksStorageConfig)

Configuration class for storing checks in a Unity Catalog volume file.

Arguments:

  • location - The Unity Catalog volume file path where the checks are stored.

InstallationChecksStorageConfig Objects

@dataclass
class InstallationChecksStorageConfig(WorkspaceFileChecksStorageConfig,
TableChecksStorageConfig,
VolumeFileChecksStorageConfig,
LakebaseChecksStorageConfig)

Configuration class for storing checks in an installation.

Arguments:

  • location - The installation path where the checks are stored (e.g., table name, file path). Not used when using installation method, as it is retrieved from the installation config, unless overwrite_location is enabled.
  • run_config_name - The name of the run configuration to use for checks, e.g. input table or job name (use "default" if not provided).
  • product_name - The product name for retrieving checks from the installation (default is 'dqx').
  • assume_user - Whether to assume the user is the owner of the checks (default is True).
  • install_folder - The installation folder where DQX is installed. DQX will be installed in a default directory if no custom folder is provided:
    • User's home directory: "/Users/<your_user>/.dqx"
    • Global directory if DQX_FORCE_INSTALL=global: "/Applications/dqx"
  • overwrite_location - Whether to overwrite the location from run config if provided (default is False).

location

retrieved from the installation config

run_config_name

to retrieve run config