Skip to main content

Table Schemas and Relationships

This page provides a reference for the table structures and relationships (EDR diagrams) used by DQX when applying quality checks, storing checks, and recording summary metrics. It describes the relationships between tables and the schema of result columns.

Overview

When you apply quality checks with DQX:

  • Output table: Contains rows that passed all checks, plus rows with warnings (warnings do not exclude rows from the output). Each row includes _errors and _warnings columns (the names can be customized). See Applying Quality Checks for details.
  • Quarantine table (optional): When configured, rows that fail error-level checks are written to the quarantine table instead of the output table. The quarantine table has the same structure as the output table, including _errors and _warnings. See Applying Quality Checks for configuration.
  • Checks table: (optional) Stores the rule definitions. Used for loading checks and for traceability via rule_fingerprint and rule_set_fingerprint. For filtering run_config_name is used. See Loading and Storing Quality Checks for details.
  • Summary metrics table (optional): When enabled, stores aggregate metrics per run (e.g. input_row_count, error_row_count). Links to output/quarantine via run_id and to the checks table via checks_location and rule_set_fingerprint. See Summary Metrics for configuration.
Output vs Quarantine

When quarantine is configured: Rows with errors go to the quarantine table; rows with only warnings or no issues go to the output table. When quarantine is not configured: All rows (including those with errors) go to the output table. In both cases, warnings are always included in the output — rows with warnings appear in the output table, and if a row has both errors and warnings, it goes to quarantine (when configured) with both _errors and _warnings populated.

Table relationships

The following EDR diagram shows how tables relate:

  • Summary metrics → Output/Quarantine: Join on run_id (present in each _errors and _warnings item).
  • Summary metrics → Checks table: Use checks_location and rule_set_fingerprint to load the rule set.
  • Output/Quarantine → Checks table: Join exploded _errors/_warnings on rule_fingerprint and rule_set_fingerprint.

Result columns: _errors and _warnings

The output and quarantine tables include two array columns appended by DQX (see Applying Quality Checks for usage details):

ColumnTypeDescription
_errorsARRAYArray of structs describing error-level check failures for this row.
_warningsARRAYArray of structs describing warning-level check failures for this row.

Each element in _errors and _warnings has the following structure:

FieldTypeDescription
nameSTRINGName of the check.
messageSTRINGMessage describing the quality issue.
columnsARRAYColumn(s) where the issue was found.
filterSTRINGFilter applied to the check, if any.
functionSTRINGCheck function applied (e.g. is_not_null).
run_timeTIMESTAMPWhen the check was executed.
run_idSTRINGUnique run ID; links to summary metrics.
user_metadataMAPOptional user-defined metadata.
rule_fingerprintSTRINGSHA-256 hash of the single rule; links to checks table.
rule_set_fingerprintSTRINGSHA-256 hash of the rule set; links to checks table.

Checks table schema

When storing checks in Delta or Lakebase tables (see Loading and Storing Quality Checks), the schema is:

ColumnTypeDescription
nameSTRINGName of the check.
criticalitySTRINGerror or warn.
checkSTRUCT<function STRING, for_each_column ARRAY<STRING>, arguments MAP<STRING, STRING>>Check definition: function name, optional for_each_column, and keyword arguments for the function. The arguments map must include every required parameter of that check function (parameters without a default in its signature); for_each_column can supply column or columns when applicable.
filterSTRINGOptional filter expression.
run_config_nameSTRINGRun configuration name for filtering (e.g. input table or job name).
user_metadataMAP<STRING, STRING>Optional user-defined metadata.
created_atTIMESTAMPWhen the check was saved.
rule_fingerprintSTRINGSHA-256 hash of the single rule.
rule_set_fingerprintSTRINGSHA-256 hash of the complete rule set.

Summary metrics table schema

Summary metrics are optional. When enabled (see Summary Metrics), the schema is:

ColumnTypeDescription
run_idSTRINGUnique run ID; links to output/quarantine _errors/_warnings.
run_nameSTRINGName of the metrics observer.
input_locationSTRINGInput dataset location (table or file).
output_locationSTRINGOutput table location.
quarantine_locationSTRINGQuarantine table location, if used.
checks_locationSTRINGWhere checks were loaded from (table or file).
rule_set_fingerprintSTRINGSHA-256 hash of the rule set; links to checks table.
metric_nameSTRINGMetric name (e.g. input_row_count, error_row_count, check_metrics).
metric_valueSTRINGAll values are stored as strings. Numeric metrics are string-encoded integers (cast with CAST(metric_value AS INT)). For check_metrics, this is a JSON string: [{"check_name": "...", "error_count": N, "warning_count": N}, ...].
run_timeTIMESTAMPWhen the run completed / check applied.
error_column_nameSTRINGName of the error column (default: _errors).
warning_column_nameSTRINGName of the warning column (default: _warnings).
user_metadataMAP<STRING, STRING>Optional run-level metadata.

The summary table stores one row per metric per run (long format). Use run_id to correlate with row-level results and rule_set_fingerprint with the checks table.