Skip to main content

Reconcile Guide

Lakebridge Reconcile validates data fidelity after migration by comparing your source system against the Databricks target. It identifies discrepancies at the row, column, and schema level.

What it does

Report typeWhat is comparedWhen to use
schemaColumn names and data typesVerify DDL migration is correct
rowHash of each row (no join key needed)Quick row-level check when there is no primary key
dataRow and column values via join columnsFull fidelity check with per-column mismatch detail
allBoth data + schemaComplete validation

Supported Source Systems

SourceSchemaRowDataAll
OracleYesYesYesYes
SnowflakeYesYesYesYes
SQL ServerYesYesYesYes
RedshiftYesYesYesYes
DatabricksYesYesYesYes

Setup

Step 1: Setup the source connection

Follow the official Databricks docs to:

note

You do not have to create a foreign catalog.

Step 2: Run configure-reconcile

If you haven't already, complete the initial setup:

databricks labs lakebridge configure-reconcile

This sets up Lakebridge workspace resources, deploys the reconciliation dashboards and creates the config file. See Installation → Configure Reconcile for details.

Config file

A reconcile config file is created under the path:

<USER_WORKSPACE_HOME>/.lakebridge/recon_config_<SOURCE>_<UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG>_<REPORT_TYPE>.json
note

For UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG: if the source is databricks then source catalog name is used else connection name is used

Examples:

source_typeconnection_name_or_catalogreport_typefile_name
databrickstpchallrecon_config_databricks_tpch_all.json
source1conn1rowrecon_config_source1_conn1_row.json
source2conn2schemarecon_config_source2_conn2_schema.json

See Configuration Reference for the full schema and examples.

Required permissions

The User configuring reconcile must have permission to:

  • Create Data Warehouses
  • Create Compute Clusters
  • USE CONNECTION on the source connection
  • USE CATALOG and CREATE SCHEMA on the target catalog
  • CREATE VOLUME if using a pre-existing schema on a serverless cluster

Serverless cluster support

Reconcile automatically detects the cluster type and optimizes intermediate data persistence accordingly:

  • On Serverless clusters: Reconcile uses Unity Catalog volumes for intermediate data persistence
  • On Standard clusters: Reconcile uses DataFrame caching for better performance
note
  • On serverless clusters, the configured volume (from metadata_config.volume) is automatically used
  • The volume must be created in the metadata catalog and schema specified in your ReconcileMetadataConfig
  • Ensure you have the necessary permissions to write to the volume on serverless clusters

Reconcile automatically adapts to the cluster type:

  • Serverless clusters: Uses Unity Catalog volumes for intermediate data persistence (metadata_config.volume)
  • Standard clusters: Uses DataFrame caching

Run

See Running Reconcile for CLI execution, notebook usage, and automation.