Skip to main content

Choosing an Execution Tier

GeoBrix raster functions come in two interchangeable Execution TiersLightweight (pyrx) and Heavyweight (rasterx). Both tiers use the same rst_* Python function names and gbx_rst_* SQL names, so switching between them is a one-line import change.

Choose the tier that fits your environment and requirements; the rest of your code stays the same.

The one-line swap

from databricks.labs.gbx.rasterx import functions as rx   # Heavyweight tier
from databricks.labs.gbx.pyrx import functions as rx # Lightweight tier — same alias

# everything below is identical across tiers:
df.select(rx.rst_slope("tile", unit="degrees"))

After an explicit rx.register(spark), the SQL names are identical too (gbx_rst_*), so SQL is portable across tiers.

Heavyweight needs more than the wheel

The one-line import swap is symmetric, but the install is not. The lightweight tier is just the [light] wheel (%pip, no JAR, no init script). The heavyweight tier additionally requires the GeoBrix JAR as a cluster library and the GDAL init script on a classic x86 cluster — the wheel alone will not resolve the import or the JVM expressions. See Installation for the heavyweight setup.

Tradeoffs

AspectHeavyweight (rasterx)Lightweight (pyrx)
InstallInit script + JARVolume-staged wheel (%pip or cluster library)
Native GDALSystem/PPA installBundled with rasterio (nothing to install)
ARM supportx86 onlyx86 and ARM
Serverless / shared clusters / Lakeflow SDPNot supportedSupported
Execution modelJVM-native (Scala + GDAL JNI)Python-worker UDFs (rasterio + NumPy)
Driver coverageFull custom GDAL buildrasterio's bundled build (narrower)
SQL default argumentsSupportedPass all arguments explicitly
Function coverageFull raster setFull raster set — every rst_* function
Readers / Writersgtiff_gdal, gdal, OGR readersraster_gbx / gtiff_gbx native Python DataSource V2 reader + writer (no JAR); vector OGR readers still heavy-only

How to choose

Start with Lightweight (pyrx). It installs as a single wheel (%pip or cluster library) with no GDAL and no JAR, runs everywhere — serverless, standard/shared clusters, ARM, and Lakeflow declarative pipelines — and covers the full raster set: every rst_* function. For most raster work it is the recommended default.

Choose Heavyweight (rasterx) in three cases:

  1. Your environment is already JVM/GDAL-based — you have the init script and JAR in place, or you specifically want JVM-native execution on a dedicated cluster.
  2. You need vector OGR readers or format-specific GDAL options — the OGR vector readers (shapefile_ogr, geojson_ogr, gpkg_ogr, …) and the generic gdal reader with exotic driver options are heavy-only. For raster I/O, the lightweight tier ships native raster_gbx / gtiff_gbx readers and a writer (no JAR; see Lightweight Raster Readers).
  3. You need the heavy-only conforming triangulation mode — GridX is now fully lightweight: the quadbin (gbx_quadbin_*), BNG (gbx_bng_*), and custom-grid (gbx_custom_*) functions all run in both tiers via the lightweight pygx package. The full VectorX function set is now available in the lightweight pyvx tier — vector-tile encoding (gbx_st_asmvt, gbx_st_asmvt_pyramid), TIN surface modeling (gbx_st_triangulate, gbx_st_interpolateelevation*), and legacy-geometry migration (gbx_st_legacyaswkb); only the Steiner-point conforming triangulation mode is heavyweight-only (the default constrained mode runs in both tiers). See GridX and VectorX Function Reference.

The vector OGR readers, the conforming triangulation mode, and the heavy pmtiles DataSource writer are the remaining heavyweight-only surfaces. Raster I/O, the full VectorX function set, all of GridX (quadbin, BNG, and custom grids), and the gbx_pmtiles_agg aggregate are now available in both tiers; heavyweight's unique surface is expected to keep narrowing.

Performance

The one-line swap keeps your code identical across tiers. The lightweight tier is functionally complete for raster — it implements every rst_* function (all 107) — and for VectorX, implementing every gbx_st_* function (MVT, TIN, legacy migration).

On per-operation timing the lightweight tier is competitive-to-faster for the large majority of functions. Band math is dramatically faster, because the heavyweight tier shells out to a subprocess there. Terrain, focal filters, and discrete-grid (H3/quadbin) aggregation are a few times faster following recent vectorization. A small number of algorithm-bound operations (for example viewshed) are slower on the lightweight tier but remain sub-second in absolute terms. Metadata accessors are sub-millisecond on both.

Across the benchmark suite the two tiers agree within tolerance on 106 of 107 functions; the sole exception is rst_convolve, which differs slightly at raster edges — the heavyweight tier applies a GDAL block-halo convolution that no single lightweight boundary mode reproduces exactly. Interior values match. See Benchmarking for the full per-function heavy-vs-light results and how to run the benchmark on a cluster or locally.

Function availability

See the Raster Functions availability section for what each tier provides.