Reconcile Guide

Reconcile is an automated tool designed to streamline the reconciliation process between source data and target data residing on Databricks. Currently, the platform exclusively offers support for Snowflake, Oracle and other Databricks tables as the primary data source. This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.

Execution Pre-Set Up

Setup the configuration file:

Once the installation is done, a folder named .lakebridge will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .lakebridge folder.

recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json

Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA

eg:

source_type	catalog_or_schema	report_type	file_name
databricks	tpch	all	recon_config_databricks_tpch_all.json
source1	tpch	row	recon_config_source1_tpch_row.json
source2	tpch	schema	recon_config_source2_tpch_schema.json

Refer to Reconcile Configuration Guide for detailed instructions and example configurations

Setup the connection properties

Lakebridge-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.

Note: When both the source and target are Databricks, a secret scope is not required.

Default Secret Scope: lakebridge_data_source

source	scope
snowflake	lakebridge_snowflake
oracle	lakebridge_oracle
databricks	lakebridge_databricks

Below are the connection properties required for each source:

Snowflake
Oracle

sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
    sfUser = [user]
    sfPassword = [password]
    sfDatabase = [database]
    sfSchema = [schema]
    sfWarehouse = [warehouse_name]
    sfRole = [role_name]
    pem_private_key = [pkcs8_pem_private_key]

note

For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.

user = [user]
password = [password]
host = [host]
port = [port]
database = [database/SID]

Databricks permissions required

User configuring reconcile must have permission to create Data Warehouses
Additionally, the user must have USE CATALOG and CREATE SCHEMA permission in order to deploy metadata tables and dashboards that are created as part of the Reconcile output. If there is a pre-existing schema, the 'create volumes' permission is also required.

Execution

You can execute the reconciliation process by executing the below command in a notebook cell.

from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient

from databricks.labs.lakebridge.reconcile.execute import recon
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException

ws = WorkspaceClient(product="lakebridge", product_version=__version__)


try:
  result = recon(
            ws = ws,
            spark = spark, # notebook spark session
            table_recon = table_recon, # previously created
            reconcile_config = reconcile_config # previously created
          )
  print(result.recon_id)
  print(result)
except ReconciliationException as e:
    recon_id = e.reconcile_output.recon_id
    print(f" Failed : {recon_id}")
    print(e)
except Exception as e:
    print(e.with_traceback)
    raise e
    print(f"Exception : {str(e)}")

for more details, refer to the Reconcile Notebook documentation.

Execution Pre-Set Up​

Execution​

Execution Pre-Set Up

Execution