Running Reconcile
This page covers how to run Lakebridge Reconcile via the CLI and via Databricks notebooks. For configuration options, see Configuration Reference.
CLI Execution
After completing setup, run reconciliation with:
databricks labs lakebridge reconcile
Results are written to the reconciliation dashboard deployed in your workspace.
Notebook Execution
Use the notebook approach when you need fine-grained control over reconcile configuration, or when running from within a Databricks notebook workflow.
Step 1: Install
- PyPI
%pip install databricks-labs-lakebridge
dbutils.library.restartPython()
Step 2: Import
from databricks.sdk import WorkspaceClient
from databricks.labs.lakebridge.config import (
ReconcileConfig,
ReconcileMetadataConfig,
TableRecon,
SourceConnectionConfig,
TargetConnectionConfig
)
from databricks.labs.lakebridge.reconcile.recon_config import (
Table,
ColumnMapping,
ColumnThresholds,
Transformation,
JdbcReaderOptions,
Aggregate,
Filters
)
from databricks.labs.lakebridge.reconcile.trigger_recon_service import TriggerReconService
from databricks.labs.lakebridge.reconcile.trigger_recon_aggregate_service import TriggerReconAggregateService
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException
Step 3: Configure ReconcileConfig
Before the actual example, some details about how to configure:
class ReconcileConfig:
report_type: str
source: SourceConnectionConfig
target: TargetConnectionConfig
metadata_config: ReconcileMetadataConfig
Parameters:
report_type: The type of report to be generated. Available report types areschema,row,dataorall. For details check here.source: The configuration for connecting to the source database to be reconciled.dialect: The dialect of the source. Supported values:snowflake,oracle,mssql,synapse,databricks,redshift,teradata.catalog: The source database/catalog name. catalog is used for consistency in namingschema: The source schema name.uc_connection_name: the connection name for the source as configured in workspaceConnections. Not allowed fordatabricks
@dataclass
class SourceConnectionConfig:
dialect: str
catalog: str
schema: str
uc_connection_name: str | None = None
target: The specs of the target databricks catalog to be reconciled.catalog: The target catalog name.schema: The target schema name.
@dataclass
class TargetConnectionConfig:
catalog: str
schema: str
metadata_config: The metadata configuration. Reconcile uses this catalog & Schema on Databricks to store all the backend metadata details for reconciliation. expects aReconcileMetadataConfigobject.catalog: The catalog name to store the metadata.schema: The schema name to store the metadata.
@dataclass
class ReconcileMetadataConfig:
catalog: str = "lakebridge"
schema: str = "reconcile"
volume: str = "reconcile_volume"
If not set the default values will be used to store the metadata. The default resources are created during the installation of Lakebridge.
Now, an Example of configuring the ReconcileConfig properties that you can copy into the notebook:
reconcile_config = ReconcileConfig(
report_type="all",
source=SourceConnectionConfig(
dialect="snowflake",
catalog="source_sf_catalog",
schema="source_sf_schema",
uc_connection_name="source_connection_name"
),
target=TargetConnectionConfig(
catalog="target_databricks_catalog",
schema="target_databricks_schema",
),
metadata_config = ReconcileMetadataConfig(
catalog = "lakebridge_metadata",
schema= "reconcile"
),
)
Step 4: Configure TableRecon
from databricks.labs.lakebridge.config import TableRecon
from databricks.labs.lakebridge.reconcile.recon_config import (
Table,
ColumnMapping,
ColumnThresholds,
TableThresholds,
Transformation,
JdbcReaderOptions,
Aggregate,
Filters
)
table_recon = TableRecon(
tables=[
Table(
source_name="source_table_name",
target_name="target_table_name",
join_columns=["store_id", "account_id"],
column_mapping=[
ColumnMapping(source_name="dept_id", target_name="department_id"),
],
column_thresholds=[
ColumnThresholds(column_name="unit_price", upper_bound="-5", lower_bound="5", type="float")
],
table_thresholds=[
TableThresholds(lower_bound="0%", upper_bound="5%", model="mismatch")
],
transformations=[
Transformation(
column_name="inventory_units",
source="coalesce(cast(cast(inventory_units as decimal(38,10)) as string), '_null_recon_')",
target='coalesce(replace(cast(format_number(cast(inventory_units as decimal(38, 10)), 10) as string), ",", ""), "_null_recon_")'
)
],
jdbc_reader_options=JdbcReaderOptions(
num_partitions=50,
partition_column="lct_nbr",
lower_bound="1",
upper_bound="50000"
),
filters=Filters(
source="lower(dept_name)='finance'",
target="lower(department_name)='finance'"
)
)
]
)
Step 5: Run
from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient
from databricks.labs.lakebridge.reconcile.trigger_recon_service import TriggerReconService
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException
ws = WorkspaceClient(product="lakebridge", product_version=__version__)
try:
result = TriggerReconService.trigger_recon(
ws=ws,
spark=spark,
table_recon=table_recon,
reconcile_config=reconcile_config
)
print(result.recon_id)
print(result)
except ReconciliationException as e:
print(f"Failed: {e.reconcile_output.recon_id}")
print(e)
Visualization
After running, use the recon_id to drill into the results on the AI/BI Dashboard deployed in your workspace during installation.
Auto-configure Table Mappings
When you have many tables to reconcile, you can skip writing the TableRecon by hand. The auto-configure-recon-tables command discovers source/target table pairs by matching their names, fills in column_mapping where source and target column names differ, and drops unmatched columns, and uploads the resulting reconcile config file to your install folder.
This is the automated alternative to Step 4: Configure TableRecon above.
Prerequisites
- CLI users:
configure-reconcilehas been run (the reconcile config file must exist in your install folder). - Python users: Step 3: Configure ReconcileConfig
Run
databricks labs lakebridge auto-configure-recon-tables
The matcher normalizes tables and columns names (case, underscores vs. hyphens, simple plural/singular) so Customers matches customers, Order_Items matches order-items, orders matches order.
Source tables the matcher cannot pair with a target table are omitted from the draft and logged so you can add them manually.
When the workspace job finishes, the resulting recon_config_<...>.json is uploaded to your install folder.
Running auto-configure overwrites existing config if the user approves. Apply manual edits after running auto-configure.
Python
The auto-configure module exposes two automation entry points:
discover_tables(...)— matches table pairs from the configured catalogs and schemas.auto_configure_tables(table_recon, ...)— apply all registered configurers (column mapping today; join keys, transformations, ... as they're added) to each Table intable_reconand return the result.
Both take the reconcile_config from Step 3 and return TableRecon, that can be passed to TriggerReconService.trigger_recon (Step 5):
from databricks.labs.lakebridge.reconcile.config_generator.execute import (
auto_configure_tables,
discover_tables,
)
# `reconcile_config` from Step 3 above.
discovered = discover_tables(reconcile_config=reconcile_config, spark=spark)
table_recon = auto_configure_tables(discovered, reconcile_config=reconcile_config, spark=spark)
for t in table_recon.tables:
print(t.source_name, "→", t.target_name)
Recommended flow: discover, review, then auto-configure
The recommended pattern is to split discovery from auto-configuration so you can review the discovered pairs before any column mappings are filled in:
- Discover — call
discover_tables(...)and inspect the returnedTableRecon. - Review — drop pairs you don't want to reconcile, fix names the matcher got wrong, and add tables it couldn't auto-match (look for the warning in the logs).
- Auto-configure — pass the curated
TableRecontoauto_configure_tables(curated, ...).
discovered = discover_tables(reconcile_config=reconcile_config, spark=spark)
# Review/curate in memory — or save the file, edit it in the workspace, and load it back.
curated = TableRecon(tables=[t for t in discovered.tables if t.source_name != "audit_log"])
table_recon = auto_configure_tables(curated, reconcile_config=reconcile_config, spark=spark)
From the CLI this is two runs of auto-configure-recon-tables:
- First run — no existing file. Answer yes to "Discover tables now?", then no to "Also run auto-configure in the same job (skips the recommended review step)?". This emits a discover-only draft for you to review.
- Second run — existing file. Answer yes to "Auto-configure and use existing table mappings (no discovery)?" — auto-configure applies its configurers to your curated draft.
If you'd rather discover and auto-configure in one job (and skip the review step), accept the "Also run auto-configure in the same job" prompt during the first run.
Plugging in a custom configurer
auto_configure_tables and auto_configure_table both accept an auto_configurers parameter (defaults to SUPPORTED_AUTO_CONFIGURERS). Pass your own list of TableAutoConfigurer implementations to extend or replace the defaults — e.g. an LLM-driven mapper, a join-key inferrer, or a transformation suggester:
from databricks.labs.lakebridge.reconcile.config_generator.configure import TableAutoConfigurer
from databricks.labs.lakebridge.reconcile.config_generator.execute import (
SUPPORTED_AUTO_CONFIGURERS,
auto_configure_tables,
)
class MyConfigurer(TableAutoConfigurer):
def configure(self, table, ctx):
... # inspect ctx.source_columns / ctx.target_columns and return an updated Table
return table
auto_configure_tables(
table_recon,
reconcile_config=reconcile_config,
spark=spark,
auto_configurers=[*SUPPORTED_AUTO_CONFIGURERS, MyConfigurer()],
)
Each configurer in the list runs in order on each table; later configurers see the table as updated by earlier ones via the ctx they share.
Auto-configure a single table
To run all supported configurers against just one table — e.g. after a column was added on the source — call auto_configure_table:
from databricks.labs.lakebridge.reconcile.config_generator.execute import auto_configure_table
from databricks.labs.lakebridge.reconcile.recon_config import Table
configured = auto_configure_table(
table=Table(source_name="orders", target_name="orders"),
reconcile_config=reconcile_config,
spark=spark,
)
print(configured)
Review the output
Auto-discovery fills in table pairs, column_mapping, select_columns (matched source columns when any source column couldn't be paired), and drop_columns (unmatched target columns). The following fields are not auto-discovered and must be added manually for the tables that need them:
join_columns— required fordataandallreport types.column_thresholds,table_thresholds— numeric tolerance bounds.transformations— column-level SQL transforms.filters— source/target WHERE clauses.
See Configuration Reference for the full schema.
This command is experimental. Review the output before running reconcile.