Skip to main content

DQX Engine

To perform data quality checking with DQX, you need to create DQEngine object. The engine requires a Databricks workspace client for authentication and interaction with the Databricks workspace.

When running the code on a Databricks workspace, the workspace client is automatically authenticated, whether DQX is used in a notebook, script, or as part of a job/workflow. You only need the following code to create the workspace client if you run DQX on Databricks workspace:

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.engine import DQEngine

ws = WorkspaceClient()
dq_engine = DQEngine(ws)

For external environments, such as CI servers or local machines, you can authenticate to Databricks using any method supported by the Databricks SDK. For detailed instructions, refer to the default authentication flow. If you're using Databricks configuration profiles or Databricks-specific environment variables for authentication, you can easily create the workspace client without needing to provide additional arguments:

ws = WorkspaceClient()

Information on testing applications that use DQEngine can be found here.

DQX engine methods

The following table outlines the available methods of the DQEngine and their functionalities:

CheckDescriptionArguments
apply_checksApplies quality checks to the DataFrame and returns a DataFrame with reporting columns.df: DataFrame to check; checks: List of checks to the DataFrame. Each check is an instance of DQRule class.
apply_checks_and_splitApplies quality checks to the DataFrame and returns valid and invalid (quarantine) DataFrames with reporting columns.df: DataFrame to check; checks: List of checks to apply to the DataFrame. Each check is an instance of DQRule class.
apply_checks_by_metadataApplies quality checks defined as a dictionary to the DataFrame and returns a DataFrame with reporting columns.df: DataFrame to check. checks: List of dictionaries describing checks. glbs: Optional dictionary with functions mapping (e.g., globals() of the calling module).
apply_checks_by_metadata_and_splitApplies quality checks defined as a dictionary and returns valid and invalid (quarantine) DataFrames.df: DataFrame to check; checks: List of dictionaries describing checks. glbs: Optional dictionary with functions mapping (e.g., globals() of the calling module).
validate_checksValidates the provided quality checks to ensure they conform to the expected structure and types.checks: List of checks to validate; glbs: Optional dictionary of global functions that can be used.
get_invalidRetrieves records from the DataFrame that violate data quality checks (records with warnings and errors).df: Input DataFrame.
get_validRetrieves records from the DataFrame that pass all data quality checks.df: Input DataFrame.
load_checks_from_local_fileLoads quality rules from a local file (supports YAML and JSON).path: Path to a file containing the checks.
save_checks_in_local_fileSaves quality rules to a local file in YAML format.checks: List of checks to save; path: Path to a file containing the checks.
load_checks_from_workspace_fileLoads checks from a file (JSON or YAML) stored in the Databricks workspace.workspace_path: Path to the file in the workspace.
load_checks_from_installationLoads checks from the workspace installation configuration file (checks_file field).run_config_name: Name of the run config to use; product_name: Name of the product/installation directory; assume_user: If True, assume user installation.
save_checks_in_workspace_fileSaves checks to a file (YAML) in the Databricks workspace.checks: List of checks to save; workspace_path: Destination path for the checks file in the workspace.
save_checks_in_installationSaves checks to the installation folder as a YAML file.checks: List of checks to save; run_config_name: Name of the run config to use; assume_user: If True, assume user installation.
load_run_configLoads run configuration from the installation folder.run_config_name: Name of the run config to use; assume_user: If True, assume user installation.