Skip to main content

Functions Overview

GeoBrix provides four specialized packages for different spatial processing needs. All packages expose Scala, Python, and SQL APIs backed by the same Spark columnar expressions.

GeoBrix Vision

Available Packages

RasterX

Full-spectrum raster processing for Databricks — successor to Mosaic raster, plus terrain analysis, spectral indices, tile publishing, and vector↔raster bridging.

  • Process GeoTIFF and other GDAL-supported raster formats
  • Raster algebra, transformations, clipping, reprojection
  • Metadata extraction, band operations, NoData handling
  • Resample, IDW interpolation, build overviews
  • Terrain analysis (slope, aspect, hillshade, TRI, TPI, roughness, color-relief)
  • Spectral indices (EVI, SAVI, NDWI, NBR, NDVI, plus a generic dispatcher)
  • Vector↔raster bridge (rasterize / polygonize)
  • Web-mercator XYZ tile output (to_webmercator, tilexyz, xyzpyramid)
  • Grid aggregations to H3 or CARTO quadbin v0 cells
  • COG output, proximity, contour, viewshed

Raster Function Reference →


GridX

Spatial indexing across multiple discrete-global-grid systems: BNG (British National Grid) for Great Britain workloads and CARTO quadbin v0 for web-mercator-aligned analytics.

  • British National Grid (BNG) — 21 functions covering cell math, kring/kloop, polyfill, tessellation, aggregators, and generators
  • CARTO Quadbin v0 — 9 functions (pointascell, aswkb, centroid, resolution, polyfill, kring, tessellate, cellunion, distance); cell IDs are 64-bit Long, aligned with the web-mercator XYZ tile grid
  • Cell area calculations, k-ring / k-loop neighborhoods, geometry-to-cell tessellation

GridX Function Reference →


VectorX

Augments Databricks built-in ST_* functions with vector-tile encoding, TIN surface modeling, and legacy-Mosaic migration helpers.

  • Mapbox Vector Tile (MVT) encoding via st_asmvt aggregator
  • Vector tile pyramid via st_asmvt_pyramid generator — composes with pmtiles_agg for end-to-end publishing
  • TIN surface modeling — st_triangulate plus st_interpolateelevationbbox / st_interpolateelevationgeom (constrained-Delaunay triangulation and grid elevation interpolation from Z-valued points)
  • Legacy Mosaic geometry conversion (migrate without installing Mosaic)
  • OGR-based reader data sources (Shapefile, GeoJSON, GeoPackage, FileGDB)

VectorX Function Reference →


VizX

Tier-agnostic visualization helpers for inspecting GeoBrix outputs in a notebook — Python-only, no SQL functions.

  • Raster plotting (plot_raster / plot_file) with auto-decimation and percentile stretch, including coverage-depth and mask-layer composites for multi-band presence stacks
  • Spark DataFrame → GeoPandas adapters (as_gdf / cells_as_gdf / grid_as_gdf) for .plot() / .explore() maps

VizX Reference →


PMTiles

Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyramids from a single static file via HTTP range requests. Native Scala v3 encoder — no GDAL/OGR dependency.

  • gbx_pmtiles_agg UDAF — aggregator returning a BINARY PMTile blob; fits tilesets up to ~100 MiB tile payload / 2 GiB cell limit
  • .write.format("pmtiles").save(path) DataSource — streams larger pyramids via a partitioned commit protocol
  • Auto-detects tile_type from magic bytes (PNG / JPEG / WebP / otherwise MVT)
  • Composes with gbx_rst_xyzpyramid (raster) and gbx_st_asmvt_pyramid (vector) upstream

Package Comparison

FeatureRasterXGridXVectorXPMTilesVizX
Primary UseRaster processingDiscrete global gridsVector encoding + legacyTile pyramid packagingNotebook visualization
Product GapFull gap-fillingSpecialized grids (BNG, quadbin)Vector-tile encoding, legacy migrationNet-newSupporting layer
GDAL RequiredYesNoYes (readers + MVT)NoNo
Output FormatTile (struct) + arraysCell IDs (Long / String) + WKBBINARY (MVT bytes), WKBBINARY (PMTile blob) or fileMatplotlib figure / GeoDataFrame
Spark Surface65+ SQL functions30+ SQL functions6+ SQL functions + DataSources1 UDAF + 1 DataSourcePython only (no SQL)

Choosing the Right Package

Use RasterX when: working with satellite imagery, DEMs, or aerial photography; performing terrain analysis, spectral indices, or per-pixel transforms; aggregating raster pixels to H3 or quadbin cells; bridging vector geometries to/from rasters; generating web-mercator XYZ tiles.

Use GridX when: working with British National Grid data; indexing global data into web-mercator-aligned quadbin cells; needing cell math (area, k-ring, polyfill, tessellation); building grid-aware aggregations or join keys.

Use VectorX when: encoding features as Mapbox Vector Tiles; generating per-tile MVT layers; building TIN surfaces / interpolating elevation from Z-valued points; reading vector formats (Shapefile, GeoJSON, GeoPackage, FileGDB); migrating from DBLabs Mosaic.

Use PMTiles when: publishing a tile pyramid (raster or vector) as a single static file; serving from S3/ABFS/GCS without a tile server; aggregating (z, x, y, bytes) rows into a deployable map.

Use VizX when: inspecting GeoBrix raster tiles or files in a notebook (single-band, RGB, coverage-depth, or mask-layer composites); turning Spark geometry / H3-cell / gridspec DataFrames into GeoPandas for .plot() / .explore() maps. VizX is a Python-only supporting layer for visualization, not a spatial-processing package.

Function Naming Convention

All GeoBrix SQL functions use the gbx_ prefix:

PackagePrefixExample
RasterXgbx_rst_gbx_rst_boundingbox
GridX/BNGgbx_bng_gbx_bng_cellarea
GridX/Quadbingbx_quadbin_gbx_quadbin_pointascell
VectorXgbx_st_gbx_st_asmvt
PMTilesgbx_pmtiles_gbx_pmtiles_agg

The gbx_ prefix distinguishes GeoBrix functions from Databricks built-in st_* functions.

Registration

Before using GeoBrix functions in Python or SQL, register them with the Spark session.

Register all packages

# Register all packages
from databricks.labs.gbx.rasterx import functions as rx
from databricks.labs.gbx.gridx.bng import functions as bx
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx

rx.register(spark)
bx.register(spark)
vx.register(spark)

Register selectively

GeoBrix raster functions come in two execution tiers; this example uses the heavyweight tier. See Choosing an Execution Tier.

# Only register RasterX
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)

Scala

Register All Packages
import com.databricks.labs.gbx.rasterx.{functions => rx}
import com.databricks.labs.gbx.gridx.bng.{functions => bx}
import com.databricks.labs.gbx.vectorx.jts.legacy.{functions => vx}

// Register each package
rx.register(spark)
bx.register(spark)
vx.register(spark)
Example output
RasterX, GridX, and VectorX functions registered (gbx_rst_*, gbx_bng_*, gbx_st_*).

SQL

SQL functions are registered via Python or Scala. Once registered, they are available in any SQL context:

-- No registration needed in SQL
-- Functions are available after Python/Scala registration

SHOW FUNCTIONS LIKE 'gbx_*';
Example output
+--------------------+
|function |
+--------------------+
|gbx_rst_asformat |
|gbx_rst_avg |
|gbx_rst_bandmetadata|
+--------------------+

API Languages

GeoBrix provides the same functionality across three languages — SQL, Python, and Scala. See Language Bindings for the registration and import patterns each surface uses, and the one-line lightweight-vs-heavyweight raster swap.

Scalar values vs lit(...) wrapping

In 0.3.0, plain scalars are accepted across Python, Scala, and SQL bindings — no f.lit(...) wrapping required for non-string values.

Python — wrappers accept Column or scalar (bool/int/float/bytes); non-string scalars are auto-wrapped with f.lit(...). Strings still follow pyspark's column-reference convention (bare string ≈ f.col(name)); wrap in f.lit("...") to pass a string literal.

# ✅ Before 0.3.0 — required f.lit for every value
rx.rst_clip("tile", "geom", f.lit(True))
rx.rst_transform("tile", f.lit(4326))
bx.bng_pointascell("pt", f.lit(1))
bx.bng_pointascell("pt", f.lit("1km"))

# ✅ 0.3.0 — scalars accepted directly
rx.rst_clip("tile", "geom", True)
rx.rst_transform("tile", 4326)
bx.bng_pointascell("pt", 1)
bx.bng_pointascell("pt", f.lit("1km")) # string literal — still wrap in f.lit

Scala — typed overloads added for Boolean / Int / Double / String value parameters. Column args (e.g. geometry, tile) still take Column.

// ✅ 0.3.0 — scalar overloads resolve without lit(...)
rst_clip(col("tile"), col("geom"), cutlineAllTouched = true)
rst_transform(col("tile"), 4326)
bng_pointascell(col("pt"), 1)
bng_pointascell(col("pt"), "1km")

SQL — values are already natively accepted by Spark SQL; no change needed:

SELECT gbx_rst_clip(tile, geom, true) FROM ...;
SELECT gbx_bng_pointascell(pt, 1) FROM ...;
SELECT gbx_bng_pointascell(pt, '1km') FROM ...;

When you still need f.lit(...) in Python:

  • String literals: rx.rst_fromfile(f.lit("/path/to.tif"), f.lit("GTiff")) — a bare string is treated as a column reference.
  • Nulls / explicit typing: e.g. f.lit(None).cast("double").

Next Steps