Skip to main content

RasterX

RasterX

Full API reference

For the complete list of RasterX functions with parameters and examples, see the RasterX Function Reference.

RasterX is GeoBrix's raster data processing package, providing comprehensive tools for working with raster datasets such as satellite imagery, elevation models, and other gridded spatial data.

Overview

RasterX is a refactor and improvement of Mosaic raster functions. Since Databricks product does not (yet) support anything built-in specifically for raster processing, RasterX provides a "fully" gap-filling capability for raster operations on the Databricks platform.

Key Features

  • GDAL-Powered: Leverages GDAL for robust raster format support
  • Distributed Processing: Built on Spark for scalable raster operations
  • Multiple Format Support: GeoTIFF, NetCDF, and other GDAL-supported formats
  • Metadata Extraction: Comprehensive raster metadata access
  • Raster Operations: Clipping, resampling, transformations
  • Band Operations: Multi-band raster support

Function Categories

RasterX exposes 65 SQL functions (registered as gbx_rst_*; available in Python and Scala as rst_*) across six categories — overview below, full reference on the RasterX Function Reference page.

RasterX function categories — Constructors, Accessors, Aggregators, Generators, Operations, H3 Grid

Accessors

Functions to access raster properties and metadata:

  • gbx_rst_boundingbox - Bounding box of the raster
  • gbx_rst_width - Raster width in pixels
  • gbx_rst_height - Raster height in pixels
  • gbx_rst_numbands - Number of bands
  • gbx_rst_metadata - Raster metadata map
  • gbx_rst_srid - Spatial reference identifier
  • gbx_rst_georeference - Georeference parameters
  • gbx_rst_pixelwidth, gbx_rst_pixelheight - Pixel size
  • gbx_rst_upperleftx, gbx_rst_upperlefty - Upper-left corner
  • gbx_rst_scalex, gbx_rst_scaley, gbx_rst_rotation, gbx_rst_skewx, gbx_rst_skewy - Geotransform components
  • gbx_rst_format - Raster format (e.g. GTiff)
  • gbx_rst_getnodata - NoData value
  • gbx_rst_bandmetadata - Band metadata
  • gbx_rst_avg, gbx_rst_min, gbx_rst_max, gbx_rst_median - Pixel statistics
  • gbx_rst_pixelcount - Number of pixels
  • gbx_rst_memsize - Approximate memory size
  • gbx_rst_type - Raster data type
  • gbx_rst_summary - Summary statistics
  • gbx_rst_subdatasets - Subdataset names (e.g. NetCDF/GRIB)
  • gbx_rst_getsubdataset - Open a subdataset by name

Constructors

  • gbx_rst_fromfile - Load raster from file path
  • gbx_rst_fromcontent - Create raster from binary content
  • gbx_rst_frombands - Build raster from band expressions

Transformations and operations

  • gbx_rst_clip - Clip raster by geometry
  • gbx_rst_transform - Reproject to a target CRS
  • gbx_rst_merge - Merge multiple rasters
  • gbx_rst_combineavg - Average multiple rasters (same extent)
  • gbx_rst_asformat - Write to a different format (e.g. COG)
  • gbx_rst_convolve - Convolution filter
  • gbx_rst_filter - Custom filter expression
  • gbx_rst_mapalgebra - Map algebra expression
  • gbx_rst_derivedband - Derive band via Python UDF
  • gbx_rst_ndvi - NDVI from red/NIR bands
  • gbx_rst_dtmfromgeoms - Rasterize geometries to DTM
  • gbx_rst_initnodata - Initialize NoData
  • gbx_rst_updatetype - Change raster data type
  • gbx_rst_isempty - Test if raster is empty
  • gbx_rst_tryopen - Open raster or return NULL on failure
  • gbx_rst_rastertoworldcoord, gbx_rst_rastertoworldcoordx, gbx_rst_rastertoworldcoordy - Pixel to world coordinates
  • gbx_rst_worldtorastercoord, gbx_rst_worldtorastercoordx, gbx_rst_worldtorastercoordy - World to pixel coordinates

Generators

  • gbx_rst_separatebands - Explode multi-band raster into rows per band
  • gbx_rst_retile - Retile rasters to a given tile size
  • gbx_rst_maketiles - Build tiles from grid spec
  • gbx_rst_tooverlappingtiles - Overlapping tile grid
  • gbx_rst_h3_tessellate - Tessellate raster into H3 cells

H3 grid aggregation

  • gbx_rst_h3_rastertogridavg - Average raster values per H3 cell
  • gbx_rst_h3_rastertogridcount - Pixel count per H3 cell
  • gbx_rst_h3_rastertogridmax, gbx_rst_h3_rastertogridmin, gbx_rst_h3_rastertogridmedian - Min/max/median per H3 cell

Aggregations

  • gbx_rst_combineavg_agg - Average multiple rasters (aggregate)
  • gbx_rst_merge_agg - Merge rasters with aggregation
  • gbx_rst_derivedband_agg - Derived band aggregate

Tile payload

Every RasterX function returns a tile whose raster field is a self-contained, in-memory raster (GTiff by default) — safe to serialize between Spark stages and executors, persist to Delta, hand off to rasterio / gdal, or write back out via the gdal writer. The bytes are never an XML reference to a per-executor /vsimem/ tempfile or to a path that only exists on the producing node.

Functions that internally build via an intermediate VRT — gbx_rst_merge, gbx_rst_merge_agg, gbx_rst_frombands, gbx_rst_combineavg, gbx_rst_combineavg_agg, gbx_rst_derivedband, gbx_rst_derivedband_agg — materialize the result to GTiff before returning, so downstream stages on different executors see real raster bytes. Inspect a tile's payload format from tile.metadata.driver; for any of the functions above, it will read GTiff (not VRT). See Beta Release Notes for the v0.3.0 correctness fix that introduced this invariant.

VRT Python pixel functions

gbx_rst_combineavg, gbx_rst_combineavg_agg, gbx_rst_derivedband, and gbx_rst_derivedband_agg evaluate a Python expression on each pixel via GDAL's VRT Python pixel-function API. That API is gated behind the GDAL config option GDAL_VRT_ENABLE_PYTHON, which GeoBrix sets to NO at executor startup (see Security § Restrict GDAL drivers). When you call one of the four functions above, GeoBrix flips the option to YES for the duration of that call only — via the internal GDALManager.withVrtPython bracket — and restores NO immediately on return. You don't need to set anything on the cluster or in your notebook to use the built-in functions.

When you need to enable it yourself

If you're invoking the GDAL Python bindings (from osgeo import gdal) directly — outside the built-in RasterX functions — and you read a VRT that declares a <PixelFunctionLanguage>Python</...> band, you'll get an empty/null read unless you enable the option in the same process. Pick one of:

Python — programmatic, scoped to your read. Recommended in all cases. Mirrors what GeoBrix does internally, works for both driver-side pyspark.sql calls and inside mapPartitions / mapInPandas UDFs that load VRT-with-pyfunc via osgeo.gdal, and survives interleaving with GeoBrix built-in calls (each GeoBrix call resets the option to NO on exit, so re-set it on every read):

from osgeo import gdal

gdal.SetConfigOption("GDAL_VRT_ENABLE_PYTHON", "YES")
try:
ds = gdal.Open("/path/to/your/vrt-with-pixel-function.vrt")
arr = ds.GetRasterBand(1).ReadAsArray()
ds = None
finally:
gdal.SetConfigOption("GDAL_VRT_ENABLE_PYTHON", "NO")

Cluster env var — for Python-worker processes only. Setting spark.executorEnv.GDAL_VRT_ENABLE_PYTHON YES on the cluster works for Python UDF workers (a separate process from the JVM, where GDAL initializes from env vars). It does not help JVM-side reads — GeoBrix calls gdal.SetConfigOption("GDAL_VRT_ENABLE_PYTHON", "NO") at executor JVM startup, and SetConfigOption takes precedence over the env var. Prefer the programmatic form above unless you have a strong reason to globally enable.

Scala / JVM code. If you're writing custom Spark expressions that consume Python-pixel VRTs, wrap the read/translate in the same helper GeoBrix uses internally — it refcounts the option so concurrent tasks on the same executor JVM compose safely:

import com.databricks.labs.gbx.rasterx.gdal.GDALManager

val result = GDALManager.withVrtPython {
val ds = org.gdal.gdal.gdal.Open(vrtPath)
// ... GDAL reads / translates here see the Python pixel function ...
ds
}

Trusted-modules variant

GDAL also accepts GDAL_VRT_ENABLE_PYTHON=TRUSTED_MODULES plus a GDAL_VRT_PYTHON_TRUSTED_MODULES allowlist if you want pixel-function code restricted to specific Python module prefixes. GeoBrix uses the plain YES form because the pixel-function source is constructed in-process from trusted (geobrix-generated) strings, never from user-supplied VRT XML on disk. If your custom code path reads VRTs whose <PixelFunctionCode> originates from less-trusted sources, switch to the TRUSTED_MODULES form and allowlist only what you intend to load.

Usage Examples

Python/PySpark

from databricks.labs.gbx.rasterx import functions as rx

# Sample data path (see Sample Data guide; use your Volume path if different)
raster_path = SAMPLE_RASTER_PATH

rx.register(spark)

raster_df = spark.read.format("gdal").load(raster_path)

metadata_df = raster_df.select(
"source",
rx.rst_width("tile").alias("width"),
rx.rst_height("tile").alias("height"),
rx.rst_numbands("tile").alias("bands"),
rx.rst_srid("tile").alias("srid"),
)
metadata_df.show()
Example output
+--------------------+-----+------+-----+----+
|source |width|height|bands|srid|
+--------------------+-----+------+-----+----+
|.../nyc_sentinel2...|10980|10980 |1 |4326|
+--------------------+-----+------+-----+----+

Scala

import com.databricks.labs.gbx.rasterx.{functions => rx}
import org.apache.spark.sql.functions._

// Register functions
rx.register(spark)

// Read raster files (sample data path; see Sample Data guide)
val rasterPath = "/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif"
val rasterDf = spark.read.format("gdal").load(rasterPath)

// Get metadata
val metadataDf = rasterDf.select(
col("path"),
rx.rst_width(col("tile")).alias("width"),
rx.rst_height(col("tile")).alias("height"),
rx.rst_numbands(col("tile")).alias("num_bands")
)

metadataDf.show()
Example output
+--------------------+-----+------+----------+
|path |width|height|num_bands |
+--------------------+-----+------+----------+
|.../nyc_sentinel2...|10980|10980 |1 |
+--------------------+-----+------+----------+

SQL

-- Register functions first in Python/Scala notebook
-- Then use in SQL

-- Read raster data (sample data path; see Sample Data guide)
CREATE OR REPLACE TEMP VIEW rasters AS
SELECT * FROM gdal.`{SAMPLE_RASTER_PATH}`;

-- Extract metadata
SELECT
path,
gbx_rst_width(tile) as width,
gbx_rst_height(tile) as height,
gbx_rst_numbands(tile) as num_bands,
gbx_rst_srid(tile) as srid
FROM rasters;
Example output
+--------------------+-----+------+----------+----+
|path |width|height|num_bands |srid|
+--------------------+-----+------+----------+----+
|.../nyc_sentinel2...|10980|10980 |1 |4326|
+--------------------+-----+------+----------+----+

Next Steps