Reconcile Guide

Lakebridge Reconcile validates data fidelity after migration by comparing your source system against the Databricks target. It identifies discrepancies at the row, column, and schema level.

What it does

Report type	What is compared	When to use
`schema`	Column names and data types	Verify DDL migration is correct
`row`	Hash of each row (no join key needed)	Quick row-level check when there is no primary key
`data`	Row and column values via join columns	Full fidelity check with per-column mismatch detail
`all`	Both `data` + `schema`	Complete validation

Supported Source Systems

Source	Schema	Row	Data	All
Oracle	Yes	Yes	Yes	Yes
Snowflake	Yes	Yes	Yes	Yes
SQL Server	Yes	Yes	Yes	Yes
Redshift	Yes	Yes	Yes	Yes
Teradata	Yes	Yes ¹	Yes ¹	Yes ¹
Databricks	Yes	Yes	Yes	Yes

Setup

Step 1: Setup the source connection

Follow the official Databricks docs to:

Create a connection
Grant connection access
Enable Databricks preview of remote_query feature

note

You do not have to create a foreign catalog.

Step 2: Run `configure-reconcile`

If you haven't already, complete the initial setup:

databricks labs lakebridge configure-reconcile

This sets up Lakebridge workspace resources. See Installation → Configure Reconcile for details.

Config file

A reconcile config file is created under the path:

<USER_WORKSPACE_HOME>/.lakebridge/recon_config_<SOURCE>_<UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG>_<REPORT_TYPE>.json

note

For UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG: if the source is databricks then source catalog name is used else connection name is used

Examples:

source_type	connection_name_or_catalog	report_type	file_name
databricks	tpch	all	recon_config_databricks_tpch_all.json
source1	conn1	row	recon_config_source1_conn1_row.json
source2	conn2	schema	recon_config_source2_conn2_schema.json

See Configuration Reference for the full schema and examples.

Required permissions

The User configuring reconcile must have permission to:

Create Data Warehouses
Create Compute Clusters
USE CONNECTION on the source connection
USE CATALOG and CREATE SCHEMA on the target catalog
CREATE VOLUME if using a pre-existing schema on a serverless cluster

Serverless cluster support

Reconcile automatically detects the cluster type and optimizes intermediate data persistence accordingly:

On Serverless clusters: Reconcile uses Unity Catalog volumes for intermediate data persistence
On Standard clusters: Reconcile uses DataFrame caching for better performance

note

On serverless clusters, the configured volume (from metadata_config.volume) is automatically used
The volume must be created in the metadata catalog and schema specified in your ReconcileMetadataConfig
Ensure you have the necessary permissions to write to the volume on serverless clusters

Reconcile automatically adapts to the cluster type:

Serverless clusters: Uses Unity Catalog volumes for intermediate data persistence (metadata_config.volume)
Standard clusters: Uses DataFrame caching

Run

See Running Reconcile for CLI execution, notebook usage, and automation.

Teradata has no portable cryptographic hash in pure SQL, so row-hash report types (row, data, all) require a user-installed hash UDF on the source and an explicit hash_expression_overrides.source entry on the recon config. See Hash Expression for wiring. ↩ ↩² ↩³

What it does​

Supported Source Systems​

Setup​

Step 1: Setup the source connection​

Step 2: Run configure-reconcile​

Config file​

Required permissions​

Serverless cluster support​

Run​

Footnotes​

What it does

Supported Source Systems

Setup

Step 1: Setup the source connection

Step 2: Run `configure-reconcile`

Config file

Required permissions

Serverless cluster support

Run

Footnotes