Skip to main content

Reconcile Guide

Reconcile is an automated tool designed to streamline the reconciliation process between source data and target data residing on Databricks. Currently, the platform exclusively offers support for Snowflake, Oracle and other Databricks tables as the primary data source. This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.

Execution Pre-Set Up

  1. Setup the configuration file:

Once the installation is done, a folder named .lakebridge will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .lakebridge folder.

recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json

Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA

eg:

source_typecatalog_or_schemareport_typefile_name
databrickstpchallrecon_config_databricks_tpch_all.json
source1tpchrowrecon_config_source1_tpch_row.json
source2tpchschemarecon_config_source2_tpch_schema.json

Refer to Reconcile Configuration Guide for detailed instructions and example configurations

  1. Setup the connection properties

Lakebridge-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.

Note: When both the source and target are Databricks, a secret scope is not required.

Default Secret Scope: lakebridge_data_source

sourcescope
snowflakelakebridge_snowflake
oraclelakebridge_oracle
databrickslakebridge_databricks

Below are the connection properties required for each source:

sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
sfUser = [user]
sfPassword = [password]
sfDatabase = [database]
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
note

For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.

  1. Databricks permissions required
  • User configuring reconcile must have permission to create Data Warehouses
  • Additionally, the user must have USE CATALOG and CREATE SCHEMA permission in order to deploy metadata tables and dashboards that are created as part of the Reconcile output. If there is a pre-existing schema, the 'create volumes' permission is also required.

Execution

You can execute the reconciliation process by executing the below command in a notebook cell.

from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient

from databricks.labs.lakebridge.reconcile.execute import recon
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException

ws = WorkspaceClient(product="lakebridge", product_version=__version__)


try:
result = recon(
ws = ws,
spark = spark, # notebook spark session
table_recon = table_recon, # previously created
reconcile_config = reconcile_config # previously created
)
print(result.recon_id)
print(result)
except ReconciliationException as e:
recon_id = e.reconcile_output.recon_id
print(f" Failed : {recon_id}")
print(e)
except Exception as e:
print(e.with_traceback)
raise e
print(f"Exception : {str(e)}")

for more details, refer to the Reconcile Notebook documentation.