User Guide
This section provides a user guide.
Installation Options
DQX can be installed in the following ways:
- Install as a Library in the Databricks cluster.
- Install as a workspace tool using Databricks CLI.
- Install DQX Studio, a web UI, as a Databricks App in your workspace — recommended for a no-code experience.
For more details, see the Installation Guide.
Integration and execution options
Supported quality checking types
| Quality checking type | Integration with processing pipelines | Description |
|---|---|---|
| In-transit | Code-level only | DQX allows data quality to be validated on the fly while the data is being processed, before it is written to storage. This requires DQX to be used as a library and integrated directly into user pipelines. |
| At-rest | Code-level or No-code (DQX Studio or Workflows) | DQX enables data quality checking on existing data stored in tables. For a no-code experience, DQX Studio (a Databricks App) is recommended; alternatively, install DQX in the workspace as a tool to deploy the no-code workflows. |
Integration options
| Task | Integration with processing pipelines | Execution | Description |
|---|---|---|---|
| Profiling and quality checks generation | Programmatic approach (Code-level integration) | Use methods of DQXProfiler and DQGenerator or DQDltGenerator classes. | Profile input data and generate quality rules candidates. DQEngine can be used to save the generated checks to a storage. For more details, see the Profiling Guide. |
| Profiling and quality checks generation | No-code UI (DQX Studio) — recommended for no-code. | Use the DQX Studio browser UI. | Profile tables and review and save the generated quality rule candidates from a browser, without code or configuration files. For more details, see the DQX Studio Guide. |
| Profiling and quality checks generation | No-code approach (Workflow). Requires installation as a workspace tool. | Use profiler workflow (triggered from Databricks CLI or Databricks UI). | Profile input data and generate quality rules candidates. Input data and quality checks storage configured in the configuration file. By default, the workflow runs for all defined run configs, but it can be parameterized to target a specific run config or wildcard patterns. For more details, see the Profiling Guide. |
| AI-assisted quality checks generation | Programmatic approach (Code-level integration) | Use methods of DQGenerator class with LLM integration. | Generate quality rules using AI/LLM assistance based on user provided business description and other input such as fully qualified table name. The AI analyzes business description to suggest relevant data quality rules. For more details, see the AI-Assisted Generation Guide. |
| AI-assisted quality checks generation | No-code UI (DQX Studio) — recommended for no-code. | Use the DQX Studio browser UI. | Generate quality rule candidates with AI/LLM assistance from a browser, then review and save them — no code required. For more details, see the DQX Studio Guide. |
| AI-assisted quality checks generation | No-code approach (Workflow). Requires installation as a workspace tool. | Use profiler workflow with LLM integration (requires serverless_clusters). | Run AI-assisted rule generation as part of the profiler workflow (statistics-based and AI-assisted generation). Requires serverless clusters for execution. For more details, see the AI-Assisted Generation Guide. |
| Quality Checking | Programmatic approach (Code-level integration) | Use methods of DQEngine class. | Offers loading checks from various storage backends, applying quality checks, saving results, as well as end-to-end methods running all the steps in a single method call (load checks > apply checks > save results). For more details, see the Applying Checks Guide. |
| Quality Checking | No-code UI (DQX Studio) — recommended for no-code. | Use the DQX Studio browser UI. | Author rules, apply checks, review results, and track run history from a browser, without code or configuration files. For more details, see the DQX Studio Guide. |
| Quality Checking | No-code approach (Workflow). Requires installation as a workspace tool. | Use quality-checker and e2e workflows (triggered from Databricks CLI or Databricks UI). | Offers quality checker workflow (load checks > apply checks > save results) and e2e (end-to-end) workflow (profile input data and generate quality checks > apply checks > save results). Input and output data, and quality checks storage configured in the configuration file. By default, the workflow runs for all defined run configs, but it can be parameterized to target a specific run config or wildcard patterns. For more details, see the Applying Checks Guide. |
For a no-code experience, DQX Studio is the recommended option. It provides a full web UI to author, review, run, and monitor quality rules directly from a browser — ideal for less technical users — with no code or configuration files to maintain. See the DQX Studio Guide to get started.
Alternatively, the no-code Workflows (profiler, quality-checker, and end-to-end jobs triggered from the Databricks CLI or UI) run quality checks on existing tables driven by a configuration file, without deploying the studio app. This suits scheduled, config-driven execution and requires installing DQX as a workspace tool.
The code-level approach is the most flexible and is best for more complex scenarios, such as integrating quality checks directly into data processing pipelines.
Defining quality rules (checks)
Quality rules can be defined in the following ways:
- As YAML or JSON files stored locally, in a Databricks Workspace, or in a Unity Catalog Volume, or as rows stored in a table. See more details in Quality Checks Storage Guide.
- Programmatically as a list of dictionary objects (can also be loaded from YAML or JSON definitions).
- Programmatically as a list of
DQRuleobjects.
Additionally, quality rule candidates can be auto-generated using the DQX profiler.
For more details, see the Quality Checks Definition Guide.
Summary metrics and monitoring
DQX can capture and store data summary metrics about your data quality across multiple tables and runs. Metrics are computed lazily and accessible after checked datasets are counted, displayed, or written to a table or files. Users can:
- Capture quality metrics for each checked dataset
- Track both default (e.g. input/error/warning/valid counts) and custom quality metrics
- Store quality metrics in Delta tables for historical analysis and alerting
- Centralize quality metrics across datasets, jobs, or job runs in a unified data quality history table
For more details, see the Summary Metrics Guide. For schema reference of output, quarantine, checks, and metrics tables, see Table Schemas and Relationships.
Using DQX with AI agents
DQX ships Agent Skills that teach AI assistants how to use the library correctly — defining checks, applying them, profiling and generating rules, and managing checks storage. They follow the open Agent Skills format and work with tools such as Databricks Genie Code and Claude Code.
For the full list of skills and installation instructions, see the AI Tools & Skills Guide.