Skip to main content

Pluggable Transpiler

Remorph pluggable transpilers

Remorph transpiles source code using pluggable transpilers. They are pluggable in the sense that:

  • their code sits outside of the remorph code base
  • there can be more than 1 installed, although as of writing, remorph can only use 1 at a given point in time
  • remorph knows nothing about them until they are discovered at runtime

Communication between remorph and a transpiler is achieved using LSP, see for example this starter to learn more about how this works.

This document describes how remorph discovers and runs transpilers.

Although one could in theory run a transpiler without access to the Databricks platform, remorph requires a valid Databricks install. Remorph leverages this by expecting transpilers to reside in the .databricks folder hierarchy, as follows:

.Databricks/
├── labs/
│ ├── remorph-transpilers/
│ │ ├── morpheus/
│ │ ├── remorph-community-transpiler/
│ │ ├── some-3rd-party-transpiler/

Each transpiler resides in its own dedicated sub-directory, whose name can be anything (although avoiding spaces is recommended). It itself comprises 2 folders:

.
├── .lib/
│ ├── .config.yml
│ ├── .<transpiler code>
├── .state/
│ ├── .version.json

A transpiler lib subdirectory must comprise a config.yml file that follows the following structure:

remorph:
version: 1 # mandatory, _must_ equal 1
name: <name of the transpiler> # mandatory, can be different from the folder name
dialects: # this section is mandatory and cannot be empty
- <sql dialect 1> # such as 'oracle' - it is recommended to leverage existing dialect names
- <sql dialect 2>
- ...
- <sql dialect _n_>
environment: # this section is optional, variables are set prior to launching the transpiler
- <name 1>: <value 1>
- <name 2>: <value 2>
- ...
- <name _n_>: <value _n_>
command_line: # this section is mandatory and cannot be empty, it is used to launch the transpiler
- <executable> # such as 'java', or 'python'
- <argument 1> # such as '-jar'
- ...
- <argument _n_>
custom: # this section is optional, it is passed to the transpiler at startup
<key 1>: <value 1> # can be pretty much anything

Databricks provides 2 transpilers: Morpheus, its advanced transpiler, and RCT (Remorph Community Transpiler). These transpilers are installed by remorph itself as part of running the install-transpile command, as follows:

  • the latest Morpheus is fetched from Maven Central, and installed at .databricks/labs/remorph-transpilers/morpheus/.
  • the latest RCT is fetched from PyPi, and installed at .databricks/labs/remorph-transpilers/remorph-community-transpiler/.

Installing 3rd party transpilers is the responsibility of their provider.

When remorph is configured, it scans the remorph-transpilers directory, and collects available source dialects and corresponding transpilers, such that the user can configure them as wished.

When a user runs the transpile command, remorph sets the working directory to the configured transpiler, appends the configured environment variables, and runs the configured command line.

The transpiler is an LSP Server i.e. it listens to commands from remorph until it is instructed to exit.


Manually installing a transpiler

There are situations where an installer may fail: security rules preventing downloads, pre-releases... Following the above steps, it is straightforward to manually install a transpiler, by:

  • creating the transpiler folder in the .databricks/labs/remorph-transpilers/ directory
  • creating the lib and state sub-folders
  • creating a config.yml file in the lib folder (see details above)
  • creating a version.json file in the state folder with content like: {"version": f"v1.3.7", "date": "2025-03-17-15:02:31Z}