databricks.labs.dqx.config
InputConfig Objects
@dataclass
class InputConfig()
Configuration class for input data sources (e.g. tables or files).
OutputConfig Objects
@dataclass
class OutputConfig()
Configuration class for output data sinks (e.g. tables or files).
__post_init__
def __post_init__()
Normalize trigger configuration by converting string boolean representations to actual booleans. This is required due to the limitation of the config deserializer.
ProfilerConfig Objects
@dataclass
class ProfilerConfig()
Configuration class for profiler.
summary_stats_file
file containing profile summary statistics
sample_fraction
fraction of data to sample (30%)
sample_seed
seed for sampling
limit
limit the number of records to profile
filter
filter to apply to the data before profiling
RunConfig Objects
@dataclass
class RunConfig()
Configuration class for the data quality checks
name
name of the run configuration
quarantine_config
quarantined data table
metrics_config
summary metrics table
checks_user_requirements
user input for AI-assisted rule generation
warehouse_id
warehouse id to use in the dashboard
reference_tables
reference tables to use in the checks
LLMModelConfig Objects
@dataclass
class LLMModelConfig()
Configuration for LLM model
api_key
when used with Profiler Workflow, this should be a secret: secret_scope/secret_key
api_base
when used with Profiler Workflow, this should be a secret: secret_scope/secret_key
LLMConfig Objects
@dataclass(frozen=True)
class LLMConfig()
Configuration for LLM usage
ExtraParams Objects
@dataclass(frozen=True)
class ExtraParams()
Class to represent extra parameters for DQEngine.
WorkspaceConfig Objects
@dataclass
class WorkspaceConfig()
Configuration class for the workspace
extra_params
extra parameters to pass to the jobs, e.g. result_column_names
profiler_max_parallelism
max parallelism for profiling multiple tables
quality_checker_max_parallelism
max parallelism for quality checking multiple tables
custom_metrics
custom summary metrics tracked by the observer when applying checks
as_dict
def as_dict() -> dict
Convert the WorkspaceConfig to a dictionary for serialization. This method ensures that all fields, including boolean False values, are properly serialized. Used by blueprint's installation when saving the config (Installation.save()).
Returns:
A dictionary representation of the WorkspaceConfig.
get_run_config
def get_run_config(run_config_name: str | None = "default") -> RunConfig
Get the run configuration for a given run name, or the default configuration if no run name is provided.
Arguments:
run_config_name- The name of the run configuration to get, e.g. input table or job name (use "default" if not provided).
Returns:
The run configuration.
Raises:
InvalidConfigError- If no run configurations are available or if the specified run configuration name is not found.
BaseChecksStorageConfig Objects
@dataclass
class BaseChecksStorageConfig(abc.ABC)
Marker base class for storage configuration.
Arguments:
location- The file path or table name where checks are stored.
FileChecksStorageConfig Objects
@dataclass
class FileChecksStorageConfig(BaseChecksStorageConfig)
Configuration class for storing checks in a file.
Arguments:
location- The file path where the checks are stored.
WorkspaceFileChecksStorageConfig Objects
@dataclass
class WorkspaceFileChecksStorageConfig(BaseChecksStorageConfig)
Configuration class for storing checks in a workspace file.
Arguments:
location- The workspace file path where the checks are stored.
TableChecksStorageConfig Objects
@dataclass
class TableChecksStorageConfig(BaseChecksStorageConfig)
Configuration class for storing checks in a table.
Arguments:
location- The table name where the checks are stored.run_config_name- The name of the run configuration to use for checks, e.g. input table or job name (use "default" if not provided).mode- The mode for writing checks to a table (e.g., 'append' or 'overwrite'). The overwrite mode will only replace checks for the specific run config and not all checks in the table.
run_config_name
to filter checks by run config
LakebaseChecksStorageConfig Objects
@dataclass
class LakebaseChecksStorageConfig(BaseChecksStorageConfig)
Configuration class for storing checks in a Lakebase table.
Arguments:
instance_name- Name of the Lakebase instance.user- Name of the user for the Lakebase connection.location- Fully qualified name of the Lakebase table to store checks in the format 'database.schema.table'.port- The Lakebase port (default is '5432').run_config_name- Name of the run configuration to use for checks (default is 'default').mode- The mode for writing checks to a table (e.g., 'append' or 'overwrite'). The overwrite mode only replaces checks for the specific run config and not all checks in the table (default is 'overwrite').
VolumeFileChecksStorageConfig Objects
@dataclass
class VolumeFileChecksStorageConfig(BaseChecksStorageConfig)
Configuration class for storing checks in a Unity Catalog volume file.
Arguments:
location- The Unity Catalog volume file path where the checks are stored.
InstallationChecksStorageConfig Objects
@dataclass
class InstallationChecksStorageConfig(WorkspaceFileChecksStorageConfig,
TableChecksStorageConfig,
VolumeFileChecksStorageConfig,
LakebaseChecksStorageConfig)
Configuration class for storing checks in an installation.
Arguments:
location- The installation path where the checks are stored (e.g., table name, file path). Not used when using installation method, as it is retrieved from the installation config, unless overwrite_location is enabled.run_config_name- The name of the run configuration to use for checks, e.g. input table or job name (use "default" if not provided).product_name- The product name for retrieving checks from the installation (default is 'dqx').assume_user- Whether to assume the user is the owner of the checks (default is True).install_folder- The installation folder where DQX is installed. DQX will be installed in a default directory if no custom folder is provided:- User's home directory: "/Users/<your_user>/.dqx"
- Global directory if
DQX_FORCE_INSTALL=global: "/Applications/dqx"
overwrite_location- Whether to overwrite the location from run config if provided (default is False).
location
retrieved from the installation config
run_config_name
to retrieve run config