Skip to main content

Remorph Reconciliation

Reconcile is an automated tool designed to streamline the reconciliation process between source data and target data residing on Databricks. Currently, the platform exclusively offers support for Snowflake, Oracle and other Databricks tables as the primary data source. This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.

Execution Pre-Set Up

  1. Setup the configuration file:

Once the installation is done, a folder named .remorph will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .remorph folder.

recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json

Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA

eg:

source_typecatalog_or_schemareport_typefile_name
databrickstpchallrecon_config_databricks_tpch_all.json
source1tpchrowrecon_config_source1_tpch_row.json
source2tpchschemarecon_config_source2_tpch_schema.json
TO BE FIXED
#### Refer to [Reconcile Configuration Guide][def] for detailed instructions and [example configurations][config]

[def]: docs/static/recon_configurations/README.md
[config]: docs/static/recon_configurations/reconcile_config_samples.md
  1. Setup the connection properties

Remorph-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.

Note: When both the source and target are Databricks, a secret scope is not required.

Default Secret Scope: remorph_data_source

sourcescope
snowflakeremorph_snowflake
oracleremorph_oracle
databricksremorph_databricks

Below are the connection properties required for each source:

sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
sfUser = [user]
sfPassword = [password]
sfDatabase = [database]
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
note

For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.

Execution

Execute the below command to initialize the reconcile process.

 databricks labs remorph reconcile
reconcile-run