Skip to main content

Reconcile Guide

This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.

Execution Pre-Set Up

  1. Setup the configuration file:

Once the installation is done, a folder named .lakebridge will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .lakebridge folder.

recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json

Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA

eg:

source_typecatalog_or_schemareport_typefile_name
databrickstpchallrecon_config_databricks_tpch_all.json
source1tpchrowrecon_config_source1_tpch_row.json
source2tpchschemarecon_config_source2_tpch_schema.json

Refer to Reconcile Configuration Guide for detailed instructions and example configurations

  1. Setup the connection properties

Lakebridge-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.

Note: When both the source and target are Databricks, a secret scope is not required.

Default Secret Scope: lakebridge_data_source

sourcescope
snowflakelakebridge_snowflake
oraclelakebridge_oracle
databrickslakebridge_databricks
mssqllakebridge_mssql
synapselakebridge_synapse

Below are the connection properties required for each source:

sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
sfUser = [user]
sfPassword = [password]
sfDatabase = [database]
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
pem_private_key_password = [pkcs8_pem_private_key]
note

For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.

When using an encrypted pem_private_key, you'll need to provide the pem_private_key_password. This password is used to decrypt the private key for authentication.

  1. Databricks permissions required
  • User configuring reconcile must have permission to create Data Warehouses
  • Additionally, the user must have USE CATALOG and CREATE SCHEMA permission in order to deploy metadata tables and dashboards that are created as part of the Reconcile output. If there is a pre-existing schema, the 'create volumes' permission is also required.

Execution

You can execute the reconciliation process by executing the below command in a notebook cell.

from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient

from databricks.labs.lakebridge.reconcile.trigger_recon_service import TriggerReconService
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException

ws = WorkspaceClient(product="lakebridge", product_version=__version__)


try:
result = TriggerReconService.trigger_recon(
ws = ws,
spark = spark, # notebook spark session
table_recon = table_recon, # previously created
reconcile_config = reconcile_config # previously created
)
print(result.recon_id)
print(result)
except ReconciliationException as e:
recon_id = e.reconcile_output.recon_id
print(f" Failed : {recon_id}")
print(e)
except Exception as e:
print(e.with_traceback)
raise e
print(f"Exception : {str(e)}")

For more details, refer to the Reconcile Notebook documentation.