Additional Configuration
Adding user metadata to the results of all checks
You can provide user metadata to the results by specifying extra parameters when creating the engine.
The custom key-value metadata will be included in every quality check result inside the user_metadata
field.
- Python
- Workflows
from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams
user_metadata = {"key1": "value1", "key2": "value2"}
# use ExtraParams to configure one or more optional parameters
extra_parameters = ExtraParams(user_metadata=user_metadata)
ws = WorkspaceClient()
dq_engine = DQEngine(ws, extra_params=extra_parameters)
You can set the following fields in the configuration file to provide user metadata when using DQX workflows:
extra_params:
user_metadata:
custom_metadata: custom_value
Adding user metadata to the results of specific checks
You can also provide user metadata for specific checks when defining those checks programmatically or via configuration.
The custom key-value metadata will be included in every quality check result inside the user_metadata
field.
When the same properties are defined in both the engine and check-level user metadata, the check-level values will override the values set in the engine.
- Python
from databricks.labs.dqx.rule import DQRowRule
from databricks.labs.dqx import check_funcs
# define the checks programmatically using DQX classes with user metadata for an individual check
checks = [
DQRowRule( # check with user metadata
name="col_5_is_null_or_empty",
criticality="warn",
check_func=check_funcs.is_not_null_and_not_empty,
column="col5",
user_metadata={"key1": "value1", "key2": "value2"}
),
...
]
# define the checks using yaml with user metadata for an individual check
checks = yaml.safe_load("""
# check with user metadata
- criticality: warn
check:
function: is_not_null_and_not_empty
arguments:
column: col5
user_metadata:
key1: value1
key2: value2
""")
Customizing result columns
By default, DQX appends _error
and _warning
result columns to the output DataFrame or Table to flag quality issues.
You can customize the names of these result columns by specifying extra parameters when creating the engine.
- Python
- Workflows
from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams
custom_column_names = {"errors": "dq_errors", "warnings": "dq_warnings"}
# use ExtraParams to configure one or more optional parameters
extra_parameters = ExtraParams(result_column_names=custom_column_names)
ws = WorkspaceClient()
dq_engine = DQEngine(ws, extra_params=extra_parameters)
You can set the following fields in the configuration file to customize the result columns when using DQX workflows:
extra_params:
result_column_names:
errors: dq_errors
warnings: dq_warnings