Skip to main content

Additional Configuration

Adding user metadata to the results of all checks

You can provide user metadata to the results by specifying extra parameters when creating the engine. The custom key-value metadata will be included in every quality check result inside the user_metadata field.

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams

user_metadata = {"key1": "value1", "key2": "value2"}

# use ExtraParams to configure one or more optional parameters
extra_parameters = ExtraParams(user_metadata=user_metadata)

ws = WorkspaceClient()
dq_engine = DQEngine(ws, extra_params=extra_parameters)

Adding user metadata to the results of specific checks

You can also provide user metadata for specific checks when defining those checks programmatically or via configuration. The custom key-value metadata will be included in every quality check result inside the user_metadata field.

When the same properties are defined in both the engine and check-level user metadata, the check-level values will override the values set in the engine.

from databricks.labs.dqx.rule import DQRowRule
from databricks.labs.dqx import check_funcs


# define the checks programmatically using DQX classes with user metadata for an individual check
checks = [
DQRowRule( # check with user metadata
name="col_5_is_null_or_empty",
criticality="warn",
check_func=check_funcs.is_not_null_and_not_empty,
column="col5",
user_metadata={"key1": "value1", "key2": "value2"}
),
...
]

# define the checks using yaml with user metadata for an individual check
checks = yaml.safe_load("""
# check with user metadata
- criticality: warn
check:
function: is_not_null_and_not_empty
arguments:
column: col5
user_metadata:
key1: value1
key2: value2
""")

Customizing result columns

By default, DQX appends _error and _warning result columns to the output DataFrame or Table to flag quality issues. You can customize the names of these result columns by specifying extra parameters when creating the engine.

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams

custom_column_names = {"errors": "dq_errors", "warnings": "dq_warnings"}

# use ExtraParams to configure one or more optional parameters
extra_parameters = ExtraParams(result_column_names=custom_column_names)

ws = WorkspaceClient()
dq_engine = DQEngine(ws, extra_params=extra_parameters)