User Guide

This section provides a user guide.

Installation Options

DQX can be installed in the following ways:

Install as a Library in the Databricks cluster.
Install as a workspace tool using Databricks CLI.
Install DQX Studio, a web UI, as a Databricks App in your workspace — recommended for a no-code experience.

For more details, see the Installation Guide.

Integration and execution options

Supported quality checking types

Quality checking type	Integration with processing pipelines	Description
In-transit	Code-level only	DQX allows data quality to be validated on the fly while the data is being processed, before it is written to storage. This requires DQX to be used as a library and integrated directly into user pipelines.
At-rest	Code-level or No-code (DQX Studio or Workflows)	DQX enables data quality checking on existing data stored in tables. For a no-code experience, DQX Studio (a Databricks App) is recommended; alternatively, install DQX in the workspace as a tool to deploy the no-code workflows.

Integration options

Task	Integration with processing pipelines	Execution	Description
Profiling and quality checks generation	Programmatic approach (Code-level integration)	Use methods of `DQXProfiler` and `DQGenerator` or `DQDltGenerator` classes.	Profile input data and generate quality rules candidates. `DQEngine` can be used to save the generated checks to a storage. For more details, see the Profiling Guide.
Profiling and quality checks generation	No-code UI (DQX Studio) — recommended for no-code.	Use the DQX Studio browser UI.	Profile tables and review and save the generated quality rule candidates from a browser, without code or configuration files. For more details, see the DQX Studio Guide.
Profiling and quality checks generation	No-code approach (Workflow). Requires installation as a workspace tool.	Use `profiler` workflow (triggered from Databricks CLI or Databricks UI).	Profile input data and generate quality rules candidates. Input data and quality checks storage configured in the configuration file. By default, the workflow runs for all defined run configs, but it can be parameterized to target a specific run config or wildcard patterns. For more details, see the Profiling Guide.
AI-assisted quality checks generation	Programmatic approach (Code-level integration)	Use methods of `DQGenerator` class with LLM integration.	Generate quality rules using AI/LLM assistance based on user provided business description and other input such as fully qualified table name. The AI analyzes business description to suggest relevant data quality rules. For more details, see the AI-Assisted Generation Guide.
AI-assisted quality checks generation	No-code UI (DQX Studio) — recommended for no-code.	Use the DQX Studio browser UI.	Generate quality rule candidates with AI/LLM assistance from a browser, then review and save them — no code required. For more details, see the DQX Studio Guide.
AI-assisted quality checks generation	No-code approach (Workflow). Requires installation as a workspace tool.	Use `profiler` workflow with LLM integration (requires `serverless_clusters`).	Run AI-assisted rule generation as part of the profiler workflow (statistics-based and AI-assisted generation). Requires serverless clusters for execution. For more details, see the AI-Assisted Generation Guide.
Quality Checking	Programmatic approach (Code-level integration)	Use methods of `DQEngine` class.	Offers loading checks from various storage backends, applying quality checks, saving results, as well as end-to-end methods running all the steps in a single method call (load checks > apply checks > save results). For more details, see the Applying Checks Guide.
Quality Checking	No-code UI (DQX Studio) — recommended for no-code.	Use the DQX Studio browser UI.	Author rules, apply checks, review results, and track run history from a browser, without code or configuration files. For more details, see the DQX Studio Guide.
Quality Checking	No-code approach (Workflow). Requires installation as a workspace tool.	Use `quality-checker` and `e2e` workflows (triggered from Databricks CLI or Databricks UI).	Offers quality checker workflow (load checks > apply checks > save results) and e2e (end-to-end) workflow (profile input data and generate quality checks > apply checks > save results). Input and output data, and quality checks storage configured in the configuration file. By default, the workflow runs for all defined run configs, but it can be parameterized to target a specific run config or wildcard patterns. For more details, see the Applying Checks Guide.

For a no-code experience, DQX Studio is the recommended option. It provides a full web UI to author, review, run, and monitor quality rules directly from a browser — ideal for less technical users — with no code or configuration files to maintain. See the DQX Studio Guide to get started.

Alternatively, the no-code Workflows (profiler, quality-checker, and end-to-end jobs triggered from the Databricks CLI or UI) run quality checks on existing tables driven by a configuration file, without deploying the studio app. This suits scheduled, config-driven execution and requires installing DQX as a workspace tool.

The code-level approach is the most flexible and is best for more complex scenarios, such as integrating quality checks directly into data processing pipelines.

Defining quality rules (checks)

Quality rules can be defined in the following ways:

As YAML or JSON files stored locally, in a Databricks Workspace, or in a Unity Catalog Volume, or as rows stored in a table. See more details in Quality Checks Storage Guide.
Programmatically as a list of dictionary objects (can also be loaded from YAML or JSON definitions).
Programmatically as a list of DQRule objects.

Additionally, quality rule candidates can be auto-generated using the DQX profiler.

For more details, see the Quality Checks Definition Guide.

Summary metrics and monitoring

DQX can capture and store data summary metrics about your data quality across multiple tables and runs. Metrics are computed lazily and accessible after checked datasets are counted, displayed, or written to a table or files. Users can:

Capture quality metrics for each checked dataset
Track both default (e.g. input/error/warning/valid counts) and custom quality metrics
Store quality metrics in Delta tables for historical analysis and alerting
Centralize quality metrics across datasets, jobs, or job runs in a unified data quality history table

For more details, see the Summary Metrics Guide. For schema reference of output, quarantine, checks, and metrics tables, see Table Schemas and Relationships.

Using DQX with AI agents

DQX ships Agent Skills that teach AI assistants how to use the library correctly — defining checks, applying them, profiling and generating rules, and managing checks storage. They follow the open Agent Skills format and work with tools such as Databricks Genie Code and Claude Code.

For the full list of skills and installation instructions, see the AI Tools & Skills Guide.

Installation Options​

Integration and execution options​

Supported quality checking types​

Integration options​

Defining quality rules (checks)​

Summary metrics and monitoring​

Using DQX with AI agents​