Choosing an Execution Tier
GeoBrix raster functions come in two interchangeable Execution Tiers — Lightweight (pyrx) and Heavyweight (rasterx). Both tiers use the same rst_* Python function names and gbx_rst_* SQL names, so switching between them is a one-line import change.
Choose the tier that fits your environment and requirements; the rest of your code stays the same.
The one-line swap
from databricks.labs.gbx.rasterx import functions as rx # Heavyweight tier
from databricks.labs.gbx.pyrx import functions as rx # Lightweight tier — same alias
# everything below is identical across tiers:
df.select(rx.rst_slope("tile", unit="degrees"))
After an explicit rx.register(spark), the SQL names are identical too (gbx_rst_*), so SQL is portable across tiers.
The one-line import swap is symmetric, but the install is not. The lightweight tier is just the [light] wheel (%pip, no JAR, no init script). The heavyweight tier additionally requires the GeoBrix JAR as a cluster library and the GDAL init script on a classic x86 cluster — the wheel alone will not resolve the import or the JVM expressions. See Installation for the heavyweight setup.
Tradeoffs
| Aspect | Heavyweight (rasterx) | Lightweight (pyrx) |
|---|---|---|
| Install | Init script + JAR | Volume-staged wheel (%pip or cluster library) |
| Native GDAL | System/PPA install | Bundled with rasterio (nothing to install) |
| ARM support | x86 only | x86 and ARM |
| Serverless / shared clusters / Lakeflow SDP | Not supported | Supported |
| Execution model | JVM-native (Scala + GDAL JNI) | Python-worker UDFs (rasterio + NumPy) |
| Driver coverage | Full custom GDAL build | rasterio's bundled build (narrower) |
| SQL default arguments | Supported | Pass all arguments explicitly |
| Function coverage | Full raster set | Full raster set — every rst_* function |
| Readers / Writers | gtiff_gdal, gdal, OGR readers | raster_gbx / gtiff_gbx native Python DataSource V2 reader + writer (no JAR); vector OGR readers still heavy-only |
How to choose
Start with Lightweight (pyrx). It installs as a single wheel (%pip or cluster library) with no GDAL and no JAR, runs everywhere — serverless, standard/shared clusters, ARM, and Lakeflow declarative pipelines — and covers the full raster set: every rst_* function. For most raster work it is the recommended default.
Choose Heavyweight (rasterx) in three cases:
- Your environment is already JVM/GDAL-based — you have the init script and JAR in place, or you specifically want JVM-native execution on a dedicated cluster.
- You need vector OGR readers or format-specific GDAL options — the OGR vector readers (
shapefile_ogr,geojson_ogr,gpkg_ogr, …) and the genericgdalreader with exotic driver options are heavy-only. For raster I/O, the lightweight tier ships nativeraster_gbx/gtiff_gbxreaders and a writer (no JAR; see Lightweight Raster Readers). - You need the heavy-only
conformingtriangulation mode — GridX is now fully lightweight: the quadbin (gbx_quadbin_*), BNG (gbx_bng_*), and custom-grid (gbx_custom_*) functions all run in both tiers via the lightweightpygxpackage. The full VectorX function set is now available in the lightweightpyvxtier — vector-tile encoding (gbx_st_asmvt,gbx_st_asmvt_pyramid), TIN surface modeling (gbx_st_triangulate,gbx_st_interpolateelevation*), and legacy-geometry migration (gbx_st_legacyaswkb); only the Steiner-pointconformingtriangulation mode is heavyweight-only (the defaultconstrainedmode runs in both tiers). See GridX and VectorX Function Reference.
The vector OGR readers, the conforming triangulation mode, and the heavy pmtiles DataSource writer are the remaining heavyweight-only surfaces. Raster I/O, the full VectorX function set, all of GridX (quadbin, BNG, and custom grids), and the gbx_pmtiles_agg aggregate are now available in both tiers; heavyweight's unique surface is expected to keep narrowing.
Performance
The one-line swap keeps your code identical across tiers. The lightweight tier is functionally complete for raster — it implements every rst_* function (all 107) — and for VectorX, implementing every gbx_st_* function (MVT, TIN, legacy migration).
On per-operation timing the lightweight tier is competitive-to-faster for the large majority of functions. Band math is dramatically faster, because the heavyweight tier shells out to a subprocess there. Terrain, focal filters, and discrete-grid (H3/quadbin) aggregation are a few times faster following recent vectorization. A small number of algorithm-bound operations (for example viewshed) are slower on the lightweight tier but remain sub-second in absolute terms. Metadata accessors are sub-millisecond on both.
Across the benchmark suite the two tiers agree within tolerance on 106 of 107 functions; the sole exception is rst_convolve, which differs slightly at raster edges — the heavyweight tier applies a GDAL block-halo convolution that no single lightweight boundary mode reproduces exactly. Interior values match. See Benchmarking for the full per-function heavy-vs-light results and how to run the benchmark on a cluster or locally.
Function availability
See the Raster Functions availability section for what each tier provides.