Pluggable Transpiler
Remorph pluggable transpilers
Remorph
transpiles source code using pluggable transpilers.
They are pluggable in the sense that:
- their code sits outside of the
remorph
code base - there can be more than 1 installed, although as of writing,
remorph
can only use 1 at a given point in time remorph
knows nothing about them until they are discovered at runtime
Communication between remorph
and a transpiler is achieved using LSP
, see for example this starter to learn more about how this works.
This document describes how remorph
discovers and runs transpilers.
Although one could in theory run a transpiler without access to the Databricks platform, remorph
requires a valid Databricks install.
Remorph leverages this by expecting transpilers to reside in the .databricks
folder hierarchy, as follows:
.Databricks/
├── labs/
│ ├── remorph-transpilers/
│ │ ├── morpheus/
│ │ ├── remorph-community-transpiler/
│ │ ├── some-3rd-party-transpiler/
Each transpiler resides in its own dedicated sub-directory, whose name can be anything (although avoiding spaces is recommended). It itself comprises 2 folders:
.
├── .lib/
│ ├── .config.yml
│ ├── .<transpiler code>
├── .state/
│ ├── .version.json
A transpiler lib
subdirectory must comprise a config.yml
file that follows the following structure:
remorph:
version: 1 # mandatory, _must_ equal 1
name: <name of the transpiler> # mandatory, can be different from the folder name
dialects: # this section is mandatory and cannot be empty
- <sql dialect 1> # such as 'oracle' - it is recommended to leverage existing dialect names
- <sql dialect 2>
- ...
- <sql dialect _n_>
environment: # this section is optional, variables are set prior to launching the transpiler
- <name 1>: <value 1>
- <name 2>: <value 2>
- ...
- <name _n_>: <value _n_>
command_line: # this section is mandatory and cannot be empty, it is used to launch the transpiler
- <executable> # such as 'java', or 'python'
- <argument 1> # such as '-jar'
- ...
- <argument _n_>
custom: # this section is optional, it is passed to the transpiler at startup
<key 1>: <value 1> # can be pretty much anything
Databricks provides 2 transpilers: Morpheus, its advanced transpiler, and RCT (Remorph Community Transpiler).
These transpilers are installed by remorph
itself as part of running the install-transpile
command, as follows:
- the latest Morpheus is fetched from Maven Central, and installed at
.databricks/labs/remorph-transpilers/morpheus/
. - the latest RCT is fetched from PyPi, and installed at
.databricks/labs/remorph-transpilers/remorph-community-transpiler/
.
Installing 3rd party transpilers is the responsibility of their provider.
When remorph
is configured, it scans the remorph-transpilers
directory, and collects available source dialects and corresponding transpilers, such that the user can configure them as wished.
When a user runs the transpile command, remorph
sets the working directory to the configured transpiler, appends the configured environment variables, and runs the configured command line.
The transpiler is an LSP Server i.e. it listens to commands from remorph until it is instructed to exit.
Manually installing a transpiler
There are situations where an installer may fail: security rules preventing downloads, pre-releases... Following the above steps, it is straightforward to manually install a transpiler, by:
- creating the transpiler folder in the
.databricks/labs/remorph-transpilers/
directory - creating the
lib
andstate
sub-folders - creating a
config.yml
file in thelib
folder (see details above) - creating a
version.json
file in thestate
folder with content like:{"version": f"v1.3.7", "date": "2025-03-17-15:02:31Z}