Remorph Reconciliation
Reconcile is an automated tool designed to streamline the reconciliation process between source data and target data residing on Databricks. Currently, the platform exclusively offers support for Snowflake, Oracle and other Databricks tables as the primary data source. This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.
Execution Pre-Set Up
- Setup the configuration file:
Once the installation is done, a folder named .remorph will be created in the user workspace's home folder. To process the reconciliation for specific table sources, we must create a config file that gives the detailed required configurations for the table-specific ones. The file name should be in the format as below and created inside the .remorph folder.
recon_config_<SOURCE>_<CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json
Note: For CATALOG_OR_SCHEMA , if CATALOG exists then CATALOG else SCHEMA
eg:
source_type | catalog_or_schema | report_type | file_name |
---|---|---|---|
databricks | tpch | all | recon_config_databricks_tpch_all.json |
source1 | tpch | row | recon_config_source1_tpch_row.json |
source2 | tpch | schema | recon_config_source2_tpch_schema.json |
TO BE FIXED
#### Refer to [Reconcile Configuration Guide][def] for detailed instructions and [example configurations][config]
[def]: docs/static/recon_configurations/README.md
[config]: docs/static/recon_configurations/reconcile_config_samples.md
- Setup the connection properties
Remorph-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace. Below is the default secret naming convention for managing connection properties.
Note: When both the source and target are Databricks, a secret scope is not required.
Default Secret Scope: remorph_data_source
source | scope |
---|---|
snowflake | remorph_snowflake |
oracle | remorph_oracle |
databricks | remorph_databricks |
Below are the connection properties required for each source:
- Snowflake
- Oracle
sfUrl = https://[acount_name].snowflakecomputing.com
account = [acount_name]
sfUser = [user]
sfPassword = [password]
sfDatabase = [database]
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
For Snowflake authentication, either sfPassword or pem_private_key is required. Priority is given to pem_private_key, and if it is not found, sfPassword will be used. If neither is available, an exception will be raised.
user = [user]
password = [password]
host = [host]
port = [port]
database = [database/SID]
Execution
Execute the below command to initialize the reconcile process.
databricks labs remorph reconcile
