Reconcile Guide
This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.
Execution Pre-Set Up
- Setup the configuration file:
Once the installation is done, a folder named .lakebridge will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .lakebridge folder.
recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json
Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA
eg:
| source_type | catalog_or_schema | report_type | file_name |
|---|---|---|---|
| databricks | tpch | all | recon_config_databricks_tpch_all.json |
| source1 | tpch | row | recon_config_source1_tpch_row.json |
| source2 | tpch | schema | recon_config_source2_tpch_schema.json |
Refer to Reconcile Configuration Guide for detailed instructions and example configurations
- Setup the connection properties
Lakebridge-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.
Note: When both the source and target are Databricks, a secret scope is not required.
Default Secret Scope: lakebridge_data_source
| source | scope |
|---|---|
| snowflake | lakebridge_snowflake |
| oracle | lakebridge_oracle |
| databricks | lakebridge_databricks |
| mssql | lakebridge_mssql |
| synapse | lakebridge_synapse |
Below are the connection properties required for each source:
- Snowflake
- Oracle
- MS SQL Server (incl. Synapse)
sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
sfUser = [user]
sfPassword = [password]
sfDatabase = [database]
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
pem_private_key_password = [pkcs8_pem_private_key]
For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.
When using an encrypted pem_private_key, you'll need to provide the pem_private_key_password. This password is used to decrypt the private key for authentication.
user = [user]
password = [password]
host = [host]
port = [port]
database = [database/SID]
user = [user]
password = [password]
host = [host]
port = [port]
database = [database/SID]
encrypt = [true/false]
trustServerCertificate = [true/false]
- Databricks permissions required
- User configuring reconcile must have permission to create Data Warehouses
- Additionally, the user must have
USE CATALOGandCREATE SCHEMApermission in order to deploy metadata tables and dashboards that are created as part of the Reconcile output. If there is a pre-existing schema, the 'create volumes' permission is also required.
Execution
You can execute the reconciliation process by executing the below command in a notebook cell.
from databricks.labs.lakebridge import __version__
from databricks.sdk import WorkspaceClient
from databricks.labs.lakebridge.reconcile.trigger_recon_service import TriggerReconService
from databricks.labs.lakebridge.reconcile.exception import ReconciliationException
ws = WorkspaceClient(product="lakebridge", product_version=__version__)
try:
result = TriggerReconService.trigger_recon(
ws = ws,
spark = spark, # notebook spark session
table_recon = table_recon, # previously created
reconcile_config = reconcile_config # previously created
)
print(result.recon_id)
print(result)
except ReconciliationException as e:
recon_id = e.reconcile_output.recon_id
print(f" Failed : {recon_id}")
print(e)
except Exception as e:
print(e.with_traceback)
raise e
print(f"Exception : {str(e)}")
For more details, refer to the Reconcile Notebook documentation.