Reconcile Guide
Lakebridge Reconcile validates data fidelity after migration by comparing your source system against the Databricks target. It identifies discrepancies at the row, column, and schema level.
What it does
| Report type | What is compared | When to use |
|---|---|---|
schema | Column names and data types | Verify DDL migration is correct |
row | Hash of each row (no join key needed) | Quick row-level check when there is no primary key |
data | Row and column values via join columns | Full fidelity check with per-column mismatch detail |
all | Both data + schema | Complete validation |
Supported Source Systems
| Source | Schema | Row | Data | All |
|---|---|---|---|---|
| Oracle | Yes | Yes | Yes | Yes |
| Snowflake | Yes | Yes | Yes | Yes |
| SQL Server | Yes | Yes | Yes | Yes |
| Redshift | Yes | Yes | Yes | Yes |
| Databricks | Yes | Yes | Yes | Yes |
Setup
Step 1: Setup the source connection
Follow the official Databricks docs to:
- Create a connection
- Grant connection access
- Enable Databricks preview of
remote_queryfeature
You do not have to create a foreign catalog.
Step 2: Run configure-reconcile
If you haven't already, complete the initial setup:
databricks labs lakebridge configure-reconcile
This sets up Lakebridge workspace resources, deploys the reconciliation dashboards and creates the config file. See Installation → Configure Reconcile for details.
Config file
A reconcile config file is created under the path:
<USER_WORKSPACE_HOME>/.lakebridge/recon_config_<SOURCE>_<UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG>_<REPORT_TYPE>.json
For UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG: if the source is databricks then source catalog name is used else connection name is used
Examples:
| source_type | connection_name_or_catalog | report_type | file_name |
|---|---|---|---|
| databricks | tpch | all | recon_config_databricks_tpch_all.json |
| source1 | conn1 | row | recon_config_source1_conn1_row.json |
| source2 | conn2 | schema | recon_config_source2_conn2_schema.json |
See Configuration Reference for the full schema and examples.
Required permissions
The User configuring reconcile must have permission to:
- Create Data Warehouses
- Create Compute Clusters
USE CONNECTIONon the source connectionUSE CATALOGandCREATE SCHEMAon the target catalogCREATE VOLUMEif using a pre-existing schema on a serverless cluster
Serverless cluster support
Reconcile automatically detects the cluster type and optimizes intermediate data persistence accordingly:
- On Serverless clusters: Reconcile uses Unity Catalog volumes for intermediate data persistence
- On Standard clusters: Reconcile uses DataFrame caching for better performance
- On serverless clusters, the configured volume (from
metadata_config.volume) is automatically used - The volume must be created in the metadata catalog and schema specified in your
ReconcileMetadataConfig - Ensure you have the necessary permissions to write to the volume on serverless clusters
Reconcile automatically adapts to the cluster type:
- Serverless clusters: Uses Unity Catalog volumes for intermediate data persistence (
metadata_config.volume) - Standard clusters: Uses DataFrame caching
Run
See Running Reconcile for CLI execution, notebook usage, and automation.