Skip to main content

Pluggable Transpiler

Lakebridge pluggable transpilers

Lakebridge transpiles source code using pluggable transpilers. They are pluggable in the sense that:

  • their code sits outside of the lakebridge code base
  • there can be more than 1 installed, although as of writing, lakebridge can only use 1 at a given point in time
  • lakebridge knows nothing about them until they are discovered at runtime

Communication between lakebridge and a transpiler is achieved using LSP, see for example this starter to learn more about how this works.

This document describes how lakebridge discovers and runs transpilers.

Although one could in theory run a transpiler without access to the Databricks platform, lakebridge requires a valid Databricks install. Lakebridge leverages this by expecting transpilers to reside in the .databricks folder hierarchy, as follows:

.databricks/
├── labs/
│ ├── lakebridge-transpilers/
│ │ ├── morpheus/
│ │ ├── lakebridge-community-transpiler/
│ │ ├── some-3rd-party-transpiler/

Each transpiler resides in its own dedicated sub-directory, whose name can be anything (although avoiding spaces is recommended). It itself comprises 2 folders:

.
├── lib/
│ ├── config.yml
│ ├── <transpiler code>
├── state/
│ ├── version.json

A transpiler lib subdirectory must comprise a config.yml file that follows the following structure:

lakebridge:
version: 1 # mandatory, _must_ equal 1
name: <name of the transpiler> # mandatory, can be different from the folder name
dialects: # this section is mandatory and cannot be empty
- <sql dialect 1> # such as 'oracle' - it is recommended to leverage existing dialect names
- <sql dialect 2>
- ...
- <sql dialect _n_>
environment: # this section is optional, variables are set prior to launching the transpiler
<name 1>: <value 1>
<name 2>: <value 2>
...
<name _n_>: <value _n_>
command_line: # this section is mandatory and cannot be empty, it is used to launch the transpiler
- <executable> # such as 'java', or 'python'
- <argument 1> # such as '-jar'
- ...
- <argument _n_>
custom: # this section is optional, it is passed to the transpiler at startup
<key 1>: <value 1> # can be pretty much anything

Databricks provides 2 transpilers: Morpheus, its advanced transpiler, and RCT (Lakebridge Community Transpiler). These transpilers are installed by lakebridge itself as part of running the install-transpile command, as follows:

  • the latest Morpheus is fetched from Maven Central, and installed at .databricks/labs/lakebridge-transpilers/morpheus/.
  • the latest RCT is fetched from PyPi, and installed at .databricks/labs/lakebridge-transpilers/lakebridge-community-transpiler/.

Installing 3rd party transpilers is the responsibility of their provider.

When lakebridge is configured, it scans the lakebridge-transpilers directory, and collects available source dialects and corresponding transpilers, such that the user can configure them as wished.

When a user runs the transpile command, lakebridge sets the working directory to the configured transpiler, appends the configured environment variables, and runs the configured command line.

The transpiler is an LSP Server i.e. it listens to commands from lakebridge until it is instructed to exit.


Manually installing a transpiler

There are situations where an installer may fail: security rules preventing downloads, pre-releases... Following the above steps, it is straightforward to manually install a transpiler, by:

  • creating the transpiler folder in the .databricks/labs/lakebridge-transpilers/ directory
  • creating the lib and state sub-folders
  • creating a config.yml file in the lib folder (see details above)
  • creating a version.json file in the state folder with content like:
    {"version": "v1.3.7", "date": "2025-03-17-15:02:31Z"}