Skip to main content

Installation

Databricks Labs Remorph

Databricks Labs Remorph

Table of Contents


Remorph

Remorph stands as a comprehensive toolkit meticulously crafted to facilitate seamless migrations to Databricks. This suite of tools is dedicated to simplifying and optimizing the entire migration process, offering two distinctive functionalities – Transpile and Reconcile. Whether you are navigating code translation or resolving potential conflicts, Remorph ensures a smooth journey for any migration project. With Remorph as your trusted ally, the migration experience becomes not only efficient but also well-managed, setting the stage for a successful transition to the Databricks platform.

Transpile

Transpile is a self-contained SQL parser, transpiler, and validator designed to interpret a diverse range of SQL inputs and generate syntactically and semantically correct SQL in the Databricks SQL dialect. This tool serves as an automated solution, named Transpile, specifically crafted for migrating and translating SQL scripts from various sources to the Databricks SQL format. Currently, it exclusively supports Snowflake as a source platform, leveraging the open-source SQLglot.

Transpile stands out as a comprehensive and versatile SQL transpiler, boasting a robust test suite to ensure reliability. Developed entirely in Python, it not only demonstrates high performance but also highlights syntax errors and provides warnings or raises alerts for dialect incompatibilities based on configurations.

Transpiler Design Flow:

Reconcile

Reconcile is an automated tool designed to streamline the reconciliation process between source data and target data residing on Databricks. Currently, the platform exclusively offers support for Snowflake, Oracle and other Databricks tables as the primary data source. This tool empowers users to efficiently identify discrepancies and variations in data when comparing the source with the Databricks target.


Environment Setup

Pre-requisites

  1. Databricks CLI - Ensure that you have the Databricks Command-Line Interface (CLI) installed on your machine. Refer to the installation instructions provided for Linux, MacOS, and Windows, available here.
#!/usr/bin/env bash

#install dependencies
apt update && apt install -y curl sudo unzip

#install databricks cli
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/v0.242.0/install.sh | sudo sh
  1. Databricks CLI - Configure the Databricks CLI by executing the following command with appropriate host and cluster details: profile_name is optional, if not provided, the DEFAULT profile will be used.
databricks configure --host <host> --configure-cluster --profile <profile_name>

The Flag --configure-cluster gives you the prompt to select the cluster_id from the available clusters on your workspace. Alternatively you can use the environment variable DATABRICKS_CLUSTER_ID to set the cluster id you would want to use for your profile before running the databricks configure command.

export DATABRICKS_CLUSTER_ID=<cluster_id>
databricks configure --host <host> --profile <profile_name>
  1. Python - Verify that your machine has Python version 3.10 or later installed to meet the required dependencies for seamless operation.
  • Windows - Install python from here. Your Windows computer will need a shell environment (GitBash or WSL)
  • MacOS/Unix - Use brew to install python in macOS/Unix machines

Installing Databricks CLI on macOS

macos-databricks-cli-install

Install Databricks CLI via curl on Windows

windows-databricks-cli-install

Check Python version on Windows, macOS, and Unix

check-python-version

[back to top]


Install Transpile

Installation

Upon completing the environment setup, install Remorph by executing the following command:

databricks labs install remorph
transpile-install

Verify Installation

Verify the successful installation by executing the provided command; confirmation of a successful installation is indicated when the displayed output aligns with the example screenshot provided:

 databricks labs remorph transpile --help
transpile-help

[back to top]


Install Reconcile

Installation

Install Reconciliation with databricks labs cli.

databricks labs install remorph
reconcile-install

Verify Installation

Verify the successful installation by executing the provided command; confirmation of a successful installation is indicated when the displayed output aligns with the example screenshot provided:

 databricks labs remorph reconcile --help
reconcile-help

[back to top]