Reconcile Guide
Reconcile is an automated tool designed to streamline the reconciliation process between source data and target data residing on Databricks. Currently, the platform exclusively offers support for Snowflake, Oracle and other Databricks tables as the primary data source. This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.
Execution Pre-Set Up
- Setup the configuration file:
Once the installation is done, a folder named .lakebridge will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .lakebridge folder.
recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json
Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA
eg:
source_type | catalog_or_schema | report_type | file_name |
---|---|---|---|
databricks | tpch | all | recon_config_databricks_tpch_all.json |
source1 | tpch | row | recon_config_source1_tpch_row.json |
source2 | tpch | schema | recon_config_source2_tpch_schema.json |
Refer to Reconcile Configuration Guide for detailed instructions and example configurations
- Setup the connection properties
Lakebridge-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.
Note: When both the source and target are Databricks, a secret scope is not required.
Default Secret Scope: lakebridge_data_source
source | scope |
---|---|
snowflake | lakebridge_snowflake |
oracle | lakebridge_oracle |
databricks | lakebridge_databricks |
Below are the connection properties required for each source:
- Snowflake
- Oracle
sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
sfUser = [user]
sfPassword = [password]
sfDatabase = [database]
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.
user = [user]
password = [password]
host = [host]
port = [port]
database = [database/SID]
- Databricks permissions required
- User configuring reconcile must have permission to create Data Warehouses
- Additionally, the user must have
USE CATALOG
andCREATE SCHEMA
permission in order to deploy metadata tables and dashboards that are created as part of the Reconcile output. If there is a pre-existing schema, the 'create volumes' permission is also required.
Execution
You can execute the reconciliation process by executing the below command in a notebook cell.
from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient
from databricks.labs.lakebridge.reconcile.execute import recon
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException
ws = WorkspaceClient(product="lakebridge", product_version=__version__)
try:
result = recon(
ws = ws,
spark = spark, # notebook spark session
table_recon = table_recon, # previously created
reconcile_config = reconcile_config # previously created
)
print(result.recon_id)
print(result)
except ReconciliationException as e:
recon_id = e.reconcile_output.recon_id
print(f" Failed : {recon_id}")
print(e)
except Exception as e:
print(e.with_traceback)
raise e
print(f"Exception : {str(e)}")
for more details, refer to the Reconcile Notebook documentation.