Choosing an Execution Tier

GeoBrix raster functions come in two interchangeable Execution Tiers — Lightweight (pyrx) and Heavyweight (rasterx). Both tiers use the same rst_* Python function names and gbx_rst_* SQL names, so switching between them is a one-line import change.

Choose the tier that fits your environment and requirements; the rest of your code stays the same.

The one-line swap

from databricks.labs.gbx.rasterx import functions as rx   # Heavyweight tier
from databricks.labs.gbx.pyrx    import functions as rx   # Lightweight tier — same alias

# everything below is identical across tiers:
df.select(rx.rst_slope("tile", unit="degrees"))

After an explicit rx.register(spark), the SQL names are identical too (gbx_rst_*), so SQL is portable across tiers.

Heavyweight needs more than the wheel

The one-line import swap is symmetric, but the install is not. The lightweight tier is just the [light] wheel (%pip, no JAR, no init script). The heavyweight tier additionally requires the GeoBrix JAR as a cluster library and the GDAL init script on a classic x86 cluster — the wheel alone will not resolve the import or the JVM expressions. See Installation for the heavyweight setup.

Registering a subset (`only=`)

register() installs every gbx_* SQL name for the tier. To register just the functions a session uses, pass only= (lightweight tiers — pyrx, pygx, pyvx):

from databricks.labs.gbx.pyrx import functions as rx

rx.register(spark, only=["rst_slope", "rst_clip"])   # just these two
rx.register(spark)                                   # all (default)

Names are case-insensitive and accept either the SQL name (gbx_rst_slope) or the short form (rst_slope). An unrecognized name raises ValueError (typo guard). only=[] registers nothing.

Readers and writers register through a separate entry point and take only= too — selected by format name (with or without the _gbx suffix):

from databricks.labs.gbx.ds import register as ds_register

ds_register.register(spark, only=["raster_gbx", "gtiff_gbx"])  # just these formats
ds_register.register(spark, only=["shapefile"])               # 'shapefile' -> 'shapefile_gbx'
ds_register.register(spark)                                    # all readers/writers (default)

Mixing tiers per function. Because both tiers share the gbx_* names (last registration wins), you can register the heavyweight set and then override individual functions with the lightweight implementation:

from databricks.labs.gbx.rasterx import functions as heavy
from databricks.labs.gbx.pyrx    import functions as light

heavy.register(spark)                       # all heavy gbx_rst_*
light.register(spark, only=["rst_slope"])   # gbx_rst_slope now lightweight

The reverse — re-registering a few heavy functions over a lightweight session — is not yet available; only= is currently a lightweight-tier feature (heavy registers its full set). Mixing works because both tiers use the same tile struct and GTiff payload, so a tile produced by one tier flows into a function from the other.

Tradeoffs

Aspect	Heavyweight (rasterx)	Lightweight (pyrx)
Install	Init script + JAR	Volume-staged wheel (`%pip` or cluster library)
Native GDAL	System/PPA install	Bundled with rasterio (nothing to install)
ARM support	x86 only	x86 and ARM
Serverless / shared clusters / Lakeflow SDP	Not supported	Supported
Execution model	JVM-native (Scala + GDAL JNI)	Python-worker UDFs (rasterio + NumPy)
Driver coverage	Full custom GDAL build	rasterio's bundled build (narrower)
SQL default arguments	Supported	Pass all arguments explicitly
Function coverage	Full raster set	Full raster set — every `rst_*` function
Readers / Writers	`gtiff_gdal`, `gdal`, OGR readers	`raster_gbx` / `gtiff_gbx` native Python DataSource V2 reader + writer (no JAR); vector OGR readers still heavy-only

How to choose

Start with Lightweight (pyrx). It installs as a single wheel (%pip or cluster library) with no GDAL and no JAR, runs everywhere — serverless, standard/shared clusters, ARM, and Lakeflow declarative pipelines — and covers the full raster set: every rst_* function. For most raster work it is the recommended default.

Choose Heavyweight (rasterx) in three cases:

Your environment is already JVM/GDAL-based — you have the init script and JAR in place, or you specifically want JVM-native execution on a dedicated cluster.
You need vector OGR readers or format-specific GDAL options — the OGR vector readers (shapefile_ogr, geojson_ogr, gpkg_ogr, …) and the generic gdal reader with exotic driver options are heavy-only. For raster I/O, the lightweight tier ships native raster_gbx / gtiff_gbx readers and a writer (no JAR; see Lightweight Raster Readers).
You need the heavy-only conforming triangulation mode — GridX is now fully lightweight: the quadbin (gbx_quadbin_*), BNG (gbx_bng_*), and custom-grid (gbx_custom_*) functions all run in both tiers via the lightweight pygx package. The full VectorX function set is now available in the lightweight pyvx tier — vector-tile encoding (gbx_st_asmvt, gbx_st_asmvt_pyramid), TIN surface modeling (gbx_st_triangulate, gbx_st_interpolateelevation*), and legacy-geometry migration (gbx_st_legacyaswkb); only the Steiner-point conforming triangulation mode is heavyweight-only (the default constrained mode runs in both tiers). See GridX and VectorX Function Reference.

The vector OGR readers, the conforming triangulation mode, and the heavy pmtiles DataSource writer are the remaining heavyweight-only surfaces. Raster I/O, the full VectorX function set, all of GridX (quadbin, BNG, and custom grids), and the gbx_pmtiles_agg aggregate are now available in both tiers; heavyweight's unique surface is expected to keep narrowing.

Performance

The one-line swap keeps your code identical across tiers. The lightweight tier is functionally complete for raster — it implements every rst_* function (all 107) — and for VectorX, implementing every gbx_st_* function (MVT, TIN, legacy migration).

On per-operation timing the lightweight tier is competitive-to-faster for the large majority of functions. Band math is dramatically faster, because the heavyweight tier shells out to a subprocess there. Terrain, focal filters, and discrete-grid (H3/quadbin) aggregation are a few times faster following recent vectorization. A small number of algorithm-bound operations (for example viewshed) are slower on the lightweight tier but remain sub-second in absolute terms. Metadata accessors are sub-millisecond on both.

Across the benchmark suite the two tiers agree within tolerance on 106 of 107 functions; the sole exception is rst_convolve, which differs slightly at raster edges — the heavyweight tier applies a GDAL block-halo convolution that no single lightweight boundary mode reproduces exactly. Interior values match. See Benchmarking for the full per-function heavy-vs-light results and how to run the benchmark on a cluster or locally.

Function availability

See the Raster Functions availability section for what each tier provides.

The one-line swap​

Registering a subset (only=)​

Tradeoffs​

How to choose​

Performance​

Function availability​