Additional Configuration

Adding user metadata to the results of all checks

You can provide user metadata to the results by specifying extra parameters when creating the engine. The custom key-value metadata will be included in every quality check result inside the user_metadata field.

Python
Workflows

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams

user_metadata = {"key1": "value1", "key2": "value2"}

# use ExtraParams to configure one or more optional parameters
extra_parameters = ExtraParams(user_metadata=user_metadata)

ws = WorkspaceClient()
dq_engine = DQEngine(ws, extra_params=extra_parameters)

You can set the following fields in the configuration file to provide user metadata when using DQX workflows:

extra_params:
    user_metadata:
        custom_metadata: custom_value

Adding user metadata to the results of specific checks

You can also provide user metadata for specific checks when defining those checks programmatically or via configuration. The custom key-value metadata will be included in every quality check result inside the user_metadata field.

When the same properties are defined in both the engine and check-level user metadata, the check-level values will override the values set in the engine.

Python

from databricks.labs.dqx.rule import DQRowRule
from databricks.labs.dqx import check_funcs


# define the checks programmatically using DQX classes with user metadata for an individual check
checks = [
  DQRowRule(  # check with user metadata
    name="col_5_is_null_or_empty",
    criticality="warn",
    check_func=check_funcs.is_not_null_and_not_empty,
    column="col5",
    user_metadata={"key1": "value1", "key2": "value2"}
  ),
  ...
]

# define the checks using yaml with user metadata for an individual check
checks = yaml.safe_load("""
# check with user metadata
- criticality: warn
  check:
    function: is_not_null_and_not_empty
    arguments:
      column: col5
  user_metadata:
    key1: value1
    key2: value2
""")

Customizing result columns

By default, DQX appends _error and _warning result columns to the output DataFrame or Table to flag quality issues. You can customize the names of these result columns by specifying extra parameters when creating the engine.

Python
Workflows

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams

custom_column_names = {"errors": "dq_errors", "warnings": "dq_warnings"}

# use ExtraParams to configure one or more optional parameters
extra_parameters = ExtraParams(result_column_names=custom_column_names)

ws = WorkspaceClient()
dq_engine = DQEngine(ws, extra_params=extra_parameters)

You can set the following fields in the configuration file to customize the result columns when using DQX workflows:

extra_params:
    result_column_names:
        errors: dq_errors
        warnings: dq_warnings

Adding user metadata to the results of all checks​

Adding user metadata to the results of specific checks​

Customizing result columns​

Adding user metadata to the results of all checks

Adding user metadata to the results of specific checks

Customizing result columns