Table Schemas and Relationships
This page provides a reference for the table structures and relationships (EDR diagrams) used by DQX when applying quality checks, storing checks, and recording summary metrics. It describes the relationships between tables and the schema of result columns.
Overview
When you apply quality checks with DQX:
- Output table: Contains rows that passed all checks, plus rows with warnings (warnings do not exclude rows from the output). Each row includes
_errorsand_warningscolumns (the names can be customized). See Applying Quality Checks for details. - Quarantine table (optional): When configured, rows that fail error-level checks are written to the quarantine table instead of the output table. The quarantine table has the same structure as the output table, including
_errorsand_warnings. See Applying Quality Checks for configuration. - Checks table: (optional) Stores the rule definitions. Used for loading checks and for traceability via
rule_fingerprintandrule_set_fingerprint. For filteringrun_config_nameis used. See Loading and Storing Quality Checks for details. - Summary metrics table (optional): When enabled, stores aggregate metrics per run (e.g.
input_row_count,error_row_count). Links to output/quarantine viarun_idand to the checks table viachecks_locationandrule_set_fingerprint. See Summary Metrics for configuration.
When quarantine is configured: Rows with errors go to the quarantine table; rows with only warnings or no issues go to the output table. When quarantine is not configured: All rows (including those with errors) go to the output table. In both cases, warnings are always included in the output — rows with warnings appear in the output table, and if a row has both errors and warnings, it goes to quarantine (when configured) with both _errors and _warnings populated.
Table relationships
The following EDR diagram shows how tables relate:
- Summary metrics → Output/Quarantine: Join on
run_id(present in each_errorsand_warningsitem). - Summary metrics → Checks table: Use
checks_locationandrule_set_fingerprintto load the rule set. - Output/Quarantine → Checks table: Join exploded
_errors/_warningsonrule_fingerprintandrule_set_fingerprint.
Result columns: _errors and _warnings
The output and quarantine tables include two array columns appended by DQX (see Applying Quality Checks for usage details):
| Column | Type | Description |
|---|---|---|
_errors | ARRAY | Array of structs describing error-level check failures for this row. |
_warnings | ARRAY | Array of structs describing warning-level check failures for this row. |
Each element in _errors and _warnings has the following structure:
| Field | Type | Description |
|---|---|---|
name | STRING | Name of the check. |
message | STRING | Message describing the quality issue. |
columns | ARRAY | Column(s) where the issue was found. |
filter | STRING | Filter applied to the check, if any. |
function | STRING | Check function applied (e.g. is_not_null). |
run_time | TIMESTAMP | When the check was executed. |
run_id | STRING | Unique run ID; links to summary metrics. |
user_metadata | MAP | Optional user-defined metadata. |
rule_fingerprint | STRING | SHA-256 hash of the single rule; links to checks table. |
rule_set_fingerprint | STRING | SHA-256 hash of the rule set; links to checks table. |
Checks table schema
When storing checks in Delta or Lakebase tables (see Loading and Storing Quality Checks), the schema is:
| Column | Type | Description |
|---|---|---|
name | STRING | Name of the check. |
criticality | STRING | error or warn. |
check | STRUCT<function STRING, for_each_column ARRAY<STRING>, arguments MAP<STRING, STRING>> | Check definition: function name, optional for_each_column, and keyword arguments for the function. The arguments map must include every required parameter of that check function (parameters without a default in its signature); for_each_column can supply column or columns when applicable. |
filter | STRING | Optional filter expression. |
run_config_name | STRING | Run configuration name for filtering (e.g. input table or job name). |
user_metadata | MAP<STRING, STRING> | Optional user-defined metadata. |
created_at | TIMESTAMP | When the check was saved. |
rule_fingerprint | STRING | SHA-256 hash of the single rule. |
rule_set_fingerprint | STRING | SHA-256 hash of the complete rule set. |
Summary metrics table schema
Summary metrics are optional. When enabled (see Summary Metrics), the schema is:
| Column | Type | Description |
|---|---|---|
run_id | STRING | Unique run ID; links to output/quarantine _errors/_warnings. |
run_name | STRING | Name of the metrics observer. |
input_location | STRING | Input dataset location (table or file). |
output_location | STRING | Output table location. |
quarantine_location | STRING | Quarantine table location, if used. |
checks_location | STRING | Where checks were loaded from (table or file). |
rule_set_fingerprint | STRING | SHA-256 hash of the rule set; links to checks table. |
metric_name | STRING | Metric name (e.g. input_row_count, error_row_count, check_metrics). |
metric_value | STRING | All values are stored as strings. Numeric metrics are string-encoded integers (cast with CAST(metric_value AS INT)). For check_metrics, this is a JSON string: [{"check_name": "...", "error_count": N, "warning_count": N}, ...]. |
run_time | TIMESTAMP | When the run completed / check applied. |
error_column_name | STRING | Name of the error column (default: _errors). |
warning_column_name | STRING | Name of the warning column (default: _warnings). |
user_metadata | MAP<STRING, STRING> | Optional run-level metadata. |
The summary table stores one row per metric per run (long format). Use run_id to correlate with row-level results and rule_set_fingerprint with the checks table.