Skip to main content

User Guide

This section provides a user guide.

Installation Options

DQX can be installed in the following ways:

  • Install as a Library in the Databricks cluster.
  • Install as a workspace tool using Databricks CLI.

For more details, see the Installation Guide.

Integration and execution options

Supported quality checking types:

Quality checking typeIntegration with processing pipelinesDescription
In-flightCode-level onlyDQX allows data quality to be validated on the fly while the data is being processed, before it is written to storage. This requires DQX to be used as a library and integrated directly into user pipelines.
At-restCode-level or No-code (Workflows)DQX enables data quality checking on existing data stored in tables. For no-code integration, DQX must first be installed in the workspace as a tool to deploy workflows.

Integration options:

TaskIntegration with processing pipelinesExecutionDescription
Profiling and quality checks generationProgrammatic approach (Code-level integration)Use methods of DQXProfiler and DQGenerator or DQDltGenerator classes.Profile input data and generate quality rules candidates. DQEngine can be used to save the generated checks to a storage. For more details, see the Profiling Guide.
Profiling and quality checks generationNo-code approach (Workflow). Requires installation as a workspace tool.Use profiler workflow (triggered from Databricks CLI or Databricks UI).Profile input data and generate quality rules candidates. Input data and quality checks storage configured in the configuration file. For more details, see the Profiling Guide.
Quality CheckingProgrammatic approach (Code-level integration)Use methods of DQEngine class.Offers loading checks from various storage backends, applying quality checks, saving results, as well as end-to-end methods running all the steps in a single method call (load checks > apply checks > save results). For more details, see the Applying Checks Guide.
Quality CheckingNo-code approach (Workflow). Requires installation as a workspace tool.Use quality-checker and e2e workflows (triggered from Databricks CLI or Databricks UI).Offers quality checker workflow (load checks > apply checks > save results) and e2e (end-to-end) workflow (profile input data and generate quality checks > apply checks > save results). Input and output data, and quality checks storage configured in the configuration file. For more details, see the Applying Checks Guide.

The no-code approach using Workflows will be preferred for less technical users wanting to run quality checks on existing data stored in tables. It provides a user-friendly interface to execute quality checks without needing to write any code. The code-level approach is more flexible and allows for more complex scenarios, such as integrating quality checks directly into data processing pipelines.

Defining quality rules (checks)

Quality rules can be defined in the following ways:

  • As YAML or JSON files stored locally, in a Databricks workspace, or in a Unity Catalog volume, or as rows stored in a table. See more details in Quality Checks Storage Guide.
  • Programmatically as a list of dictionary objects (can also be loaded from YAML or JSON definitions).
  • Programmatically as a list of DQRule objects.

Additionally, quality rule candidates can be auto-generated using the DQX profiler.

For more details, see the Quality Checks Definition Guide.