Pluggable Transpiler
Lakebridge pluggable transpilers
Lakebridge
transpiles source code using pluggable transpilers.
They are pluggable in the sense that:
- their code sits outside of the
lakebridge
code base - there can be more than 1 installed, although as of writing,
lakebridge
can only use 1 at a given point in time lakebridge
knows nothing about them until they are discovered at runtime
Communication between lakebridge
and a transpiler is achieved using LSP
, see for example this starter to learn more about how this works.
This document describes how lakebridge
discovers and runs transpilers.
Although one could in theory run a transpiler without access to the Databricks platform, lakebridge
requires a valid Databricks install.
Lakebridge leverages this by expecting transpilers to reside in the .databricks
folder hierarchy, as follows:
.databricks/
├── labs/
│ ├── lakebridge-transpilers/
│ │ ├── morpheus/
│ │ ├── lakebridge-community-transpiler/
│ │ ├── some-3rd-party-transpiler/
Each transpiler resides in its own dedicated sub-directory, whose name can be anything (although avoiding spaces is recommended). It itself comprises 2 folders:
.
├── lib/
│ ├── config.yml
│ ├── <transpiler code>
├── state/
│ ├── version.json
A transpiler lib
subdirectory must comprise a config.yml
file that follows the following structure:
lakebridge:
version: 1 # mandatory, _must_ equal 1
name: <name of the transpiler> # mandatory, can be different from the folder name
dialects: # this section is mandatory and cannot be empty
- <sql dialect 1> # such as 'oracle' - it is recommended to leverage existing dialect names
- <sql dialect 2>
- ...
- <sql dialect _n_>
environment: # this section is optional, variables are set prior to launching the transpiler
<name 1>: <value 1>
<name 2>: <value 2>
...
<name _n_>: <value _n_>
command_line: # this section is mandatory and cannot be empty, it is used to launch the transpiler
- <executable> # such as 'java', or 'python'
- <argument 1> # such as '-jar'
- ...
- <argument _n_>
custom: # this section is optional, it is passed to the transpiler at startup
<key 1>: <value 1> # can be pretty much anything
Databricks provides 2 transpilers: Morpheus, its advanced transpiler, and RCT (Lakebridge Community Transpiler).
These transpilers are installed by lakebridge
itself as part of running the install-transpile
command, as follows:
- the latest Morpheus is fetched from Maven Central, and installed at
.databricks/labs/lakebridge-transpilers/morpheus/
. - the latest RCT is fetched from PyPi, and installed at
.databricks/labs/lakebridge-transpilers/lakebridge-community-transpiler/
.
Installing 3rd party transpilers is the responsibility of their provider.
When lakebridge
is configured, it scans the lakebridge-transpilers
directory, and collects available source dialects and corresponding transpilers, such that the user can configure them as wished.
When a user runs the transpile command, lakebridge
sets the working directory to the configured transpiler, appends the configured environment variables, and runs the configured command line.
The transpiler is an LSP Server i.e. it listens to commands from lakebridge until it is instructed to exit.
Manually installing a transpiler
There are situations where an installer may fail: security rules preventing downloads, pre-releases... Following the above steps, it is straightforward to manually install a transpiler, by:
- creating the transpiler folder in the
.databricks/labs/lakebridge-transpilers/
directory - creating the
lib
andstate
sub-folders - creating a
config.yml
file in thelib
folder (see details above) - creating a
version.json
file in thestate
folder with content like:{"version": "v1.3.7", "date": "2025-03-17-15:02:31Z"}