Skip to main content

Running Reconcile

This page covers how to run Lakebridge Reconcile via the CLI and via Databricks notebooks. For configuration options, see Configuration Reference.


CLI Execution

After completing setup, run reconciliation with:

databricks labs lakebridge reconcile

Results are written to the reconciliation dashboard deployed in your workspace.


Notebook Execution

Use the notebook approach when you need fine-grained control over reconcile configuration, or when running from within a Databricks notebook workflow.

Step 1: Install

%pip install databricks-labs-lakebridge
dbutils.library.restartPython()

Step 2: Import

from databricks.sdk import WorkspaceClient

from databricks.labs.lakebridge.config import (
ReconcileConfig,
ReconcileMetadataConfig,
TableRecon,
SourceConnectionConfig,
TargetConnectionConfig
)
from databricks.labs.lakebridge.reconcile.recon_config import (
Table,
ColumnMapping,
ColumnThresholds,
Transformation,
JdbcReaderOptions,
Aggregate,
Filters
)
from databricks.labs.lakebridge.reconcile.trigger_recon_service import TriggerReconService
from databricks.labs.lakebridge.reconcile.trigger_recon_aggregate_service import TriggerReconAggregateService
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException

Step 3: Configure ReconcileConfig

Before the actual example, some details about how to configure:

class ReconcileConfig:
report_type: str
source: SourceConnectionConfig
target: TargetConnectionConfig
metadata_config: ReconcileMetadataConfig

Parameters:

  • report_type: The type of report to be generated. Available report types are schema, row, data or all. For details check here.
  • source: The configuration for connecting to the source database to be reconciled.
    • dialect: The dialect of the source. Supported values: snowflake, oracle, mssql, synapse, databricks, redshift, teradata.
    • catalog: The source database/catalog name. catalog is used for consistency in naming
    • schema: The source schema name.
    • uc_connection_name: the connection name for the source as configured in workspace Connections. Not allowed for databricks
@dataclass
class SourceConnectionConfig:
dialect: str
catalog: str
schema: str
uc_connection_name: str | None = None
  • target: The specs of the target databricks catalog to be reconciled.
    • catalog: The target catalog name.
    • schema: The target schema name.
@dataclass
class TargetConnectionConfig:
catalog: str
schema: str
  • metadata_config: The metadata configuration. Reconcile uses this catalog & Schema on Databricks to store all the backend metadata details for reconciliation. expects a ReconcileMetadataConfig object.
    • catalog: The catalog name to store the metadata.
    • schema: The schema name to store the metadata.
@dataclass
class ReconcileMetadataConfig:
catalog: str = "lakebridge"
schema: str = "reconcile"
volume: str = "reconcile_volume"

If not set the default values will be used to store the metadata. The default resources are created during the installation of Lakebridge.

Now, an Example of configuring the ReconcileConfig properties that you can copy into the notebook:

reconcile_config = ReconcileConfig(
report_type="all",
source=SourceConnectionConfig(
dialect="snowflake",
catalog="source_sf_catalog",
schema="source_sf_schema",
uc_connection_name="source_connection_name"
),
target=TargetConnectionConfig(
catalog="target_databricks_catalog",
schema="target_databricks_schema",
),
metadata_config = ReconcileMetadataConfig(
catalog = "lakebridge_metadata",
schema= "reconcile"
),
)

Step 4: Configure TableRecon

from databricks.labs.lakebridge.config import TableRecon
from databricks.labs.lakebridge.reconcile.recon_config import (
Table,
ColumnMapping,
ColumnThresholds,
TableThresholds,
Transformation,
JdbcReaderOptions,
Aggregate,
Filters
)

table_recon = TableRecon(
tables=[
Table(
source_name="source_table_name",
target_name="target_table_name",
join_columns=["store_id", "account_id"],
column_mapping=[
ColumnMapping(source_name="dept_id", target_name="department_id"),
],
column_thresholds=[
ColumnThresholds(column_name="unit_price", upper_bound="-5", lower_bound="5", type="float")
],
table_thresholds=[
TableThresholds(lower_bound="0%", upper_bound="5%", model="mismatch")
],
transformations=[
Transformation(
column_name="inventory_units",
source="coalesce(cast(cast(inventory_units as decimal(38,10)) as string), '_null_recon_')",
target='coalesce(replace(cast(format_number(cast(inventory_units as decimal(38, 10)), 10) as string), ",", ""), "_null_recon_")'
)
],
jdbc_reader_options=JdbcReaderOptions(
num_partitions=50,
partition_column="lct_nbr",
lower_bound="1",
upper_bound="50000"
),
filters=Filters(
source="lower(dept_name)='finance'",
target="lower(department_name)='finance'"
)
)
]
)

Step 5: Run

from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient
from databricks.labs.lakebridge.reconcile.trigger_recon_service import TriggerReconService
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException

ws = WorkspaceClient(product="lakebridge", product_version=__version__)

try:
result = TriggerReconService.trigger_recon(
ws=ws,
spark=spark,
table_recon=table_recon,
reconcile_config=reconcile_config
)
print(result.recon_id)
print(result)
except ReconciliationException as e:
print(f"Failed: {e.reconcile_output.recon_id}")
print(e)

Visualization

After running, use the recon_id to drill into the results on the AI/BI Dashboard deployed in your workspace during installation.


Auto-configure Table Mappings

When you have many tables to reconcile, you can skip writing the TableRecon by hand. The auto-configure-recon-tables command discovers source/target table pairs by matching their names, fills in column_mapping where source and target column names differ, and drops unmatched columns, and uploads the resulting reconcile config file to your install folder.

This is the automated alternative to Step 4: Configure TableRecon above.

Prerequisites

Run

databricks labs lakebridge auto-configure-recon-tables

The matcher normalizes tables and columns names (case, underscores vs. hyphens, simple plural/singular) so Customers matches customers, Order_Items matches order-items, orders matches order. Source tables the matcher cannot pair with a target table are omitted from the draft and logged so you can add them manually.

When the workspace job finishes, the resulting recon_config_<...>.json is uploaded to your install folder.

danger

Running auto-configure overwrites existing config if the user approves. Apply manual edits after running auto-configure.

Python

The auto-configure module exposes two automation entry points:

  • discover_tables(...) — matches table pairs from the configured catalogs and schemas.
  • auto_configure_tables(table_recon, ...) — apply all registered configurers (column mapping today; join keys, transformations, ... as they're added) to each Table in table_recon and return the result.

Both take the reconcile_config from Step 3 and return TableRecon, that can be passed to TriggerReconService.trigger_recon (Step 5):

from databricks.labs.lakebridge.reconcile.config_generator.execute import (
auto_configure_tables,
discover_tables,
)

# `reconcile_config` from Step 3 above.
discovered = discover_tables(reconcile_config=reconcile_config, spark=spark)
table_recon = auto_configure_tables(discovered, reconcile_config=reconcile_config, spark=spark)

for t in table_recon.tables:
print(t.source_name, "→", t.target_name)

The recommended pattern is to split discovery from auto-configuration so you can review the discovered pairs before any column mappings are filled in:

  1. Discover — call discover_tables(...) and inspect the returned TableRecon.
  2. Review — drop pairs you don't want to reconcile, fix names the matcher got wrong, and add tables it couldn't auto-match (look for the warning in the logs).
  3. Auto-configure — pass the curated TableRecon to auto_configure_tables(curated, ...).
discovered = discover_tables(reconcile_config=reconcile_config, spark=spark)

# Review/curate in memory — or save the file, edit it in the workspace, and load it back.
curated = TableRecon(tables=[t for t in discovered.tables if t.source_name != "audit_log"])

table_recon = auto_configure_tables(curated, reconcile_config=reconcile_config, spark=spark)
note

From the CLI this is two runs of auto-configure-recon-tables:

  1. First run — no existing file. Answer yes to "Discover tables now?", then no to "Also run auto-configure in the same job (skips the recommended review step)?". This emits a discover-only draft for you to review.
  2. Second run — existing file. Answer yes to "Auto-configure and use existing table mappings (no discovery)?" — auto-configure applies its configurers to your curated draft.

If you'd rather discover and auto-configure in one job (and skip the review step), accept the "Also run auto-configure in the same job" prompt during the first run.

Plugging in a custom configurer

auto_configure_tables and auto_configure_table both accept an auto_configurers parameter (defaults to SUPPORTED_AUTO_CONFIGURERS). Pass your own list of TableAutoConfigurer implementations to extend or replace the defaults — e.g. an LLM-driven mapper, a join-key inferrer, or a transformation suggester:

from databricks.labs.lakebridge.reconcile.config_generator.configure import TableAutoConfigurer
from databricks.labs.lakebridge.reconcile.config_generator.execute import (
SUPPORTED_AUTO_CONFIGURERS,
auto_configure_tables,
)

class MyConfigurer(TableAutoConfigurer):
def configure(self, table, ctx):
... # inspect ctx.source_columns / ctx.target_columns and return an updated Table
return table

auto_configure_tables(
table_recon,
reconcile_config=reconcile_config,
spark=spark,
auto_configurers=[*SUPPORTED_AUTO_CONFIGURERS, MyConfigurer()],
)

Each configurer in the list runs in order on each table; later configurers see the table as updated by earlier ones via the ctx they share.

Auto-configure a single table

To run all supported configurers against just one table — e.g. after a column was added on the source — call auto_configure_table:

from databricks.labs.lakebridge.reconcile.config_generator.execute import auto_configure_table
from databricks.labs.lakebridge.reconcile.recon_config import Table

configured = auto_configure_table(
table=Table(source_name="orders", target_name="orders"),
reconcile_config=reconcile_config,
spark=spark,
)

print(configured)

Review the output

Auto-discovery fills in table pairs, column_mapping, select_columns (matched source columns when any source column couldn't be paired), and drop_columns (unmatched target columns). The following fields are not auto-discovered and must be added manually for the tables that need them:

  • join_columns — required for data and all report types.
  • column_thresholds, table_thresholds — numeric tolerance bounds.
  • transformations — column-level SQL transforms.
  • filters — source/target WHERE clauses.

See Configuration Reference for the full schema.

note

This command is experimental. Review the output before running reconcile.