RasterX

Full API reference
For the complete list of RasterX functions with parameters and examples, see the RasterX Function Reference.
RasterX is GeoBrix's raster data processing package, providing comprehensive tools for working with raster datasets such as satellite imagery, elevation models, and other gridded spatial data.
Overview
RasterX is a refactor and improvement of Mosaic raster functions. Since Databricks product does not (yet) support anything built-in specifically for raster processing, RasterX provides a "fully" gap-filling capability for raster operations on the Databricks platform.
Key Features
- GDAL-Powered: Leverages GDAL for robust raster format support
- Distributed Processing: Built on Spark for scalable raster operations
- Multiple Format Support: GeoTIFF, NetCDF, and other GDAL-supported formats
- Metadata Extraction: Comprehensive raster metadata access
- Raster Operations: Clipping, resampling, transformations
- Band Operations: Multi-band raster support
Function Categories
Accessors
Functions to access raster properties and metadata:
gbx_rst_boundingbox- Bounding box of the rastergbx_rst_width- Raster width in pixelsgbx_rst_height- Raster height in pixelsgbx_rst_numbands- Number of bandsgbx_rst_metadata- Raster metadata mapgbx_rst_srid- Spatial reference identifiergbx_rst_georeference- Georeference parametersgbx_rst_pixelwidth,gbx_rst_pixelheight- Pixel sizegbx_rst_upperleftx,gbx_rst_upperlefty- Upper-left cornergbx_rst_scalex,gbx_rst_scaley,gbx_rst_rotation,gbx_rst_skewx,gbx_rst_skewy- Geotransform componentsgbx_rst_format- Raster format (e.g. GTiff)gbx_rst_getnodata- NoData valuegbx_rst_bandmetadata- Band metadatagbx_rst_avg,gbx_rst_min,gbx_rst_max,gbx_rst_median- Pixel statisticsgbx_rst_pixelcount- Number of pixelsgbx_rst_memsize- Approximate memory sizegbx_rst_type- Raster data typegbx_rst_summary- Summary statisticsgbx_rst_subdatasets- Subdataset names (e.g. NetCDF/GRIB)gbx_rst_getsubdataset- Open a subdataset by name
Constructors
gbx_rst_fromfile- Load raster from file pathgbx_rst_fromcontent- Create raster from binary contentgbx_rst_frombands- Build raster from band expressions
Transformations and operations
gbx_rst_clip- Clip raster by geometrygbx_rst_transform- Reproject to a target CRSgbx_rst_merge- Merge multiple rastersgbx_rst_combineavg- Average multiple rasters (same extent)gbx_rst_asformat- Write to a different format (e.g. COG)gbx_rst_convolve- Convolution filtergbx_rst_filter- Custom filter expressiongbx_rst_mapalgebra- Map algebra expressiongbx_rst_derivedband- Derive band via Python UDFgbx_rst_ndvi- NDVI from red/NIR bandsgbx_rst_dtmfromgeoms- Rasterize geometries to DTMgbx_rst_initnodata- Initialize NoDatagbx_rst_updatetype- Change raster data typegbx_rst_isempty- Test if raster is emptygbx_rst_tryopen- Open raster or return NULL on failuregbx_rst_rastertoworldcoord,gbx_rst_rastertoworldcoordx,gbx_rst_rastertoworldcoordy- Pixel to world coordinatesgbx_rst_worldtorastercoord,gbx_rst_worldtorastercoordx,gbx_rst_worldtorastercoordy- World to pixel coordinates
Generators
gbx_rst_separatebands- Explode multi-band raster into rows per bandgbx_rst_retile- Retile rasters to a given tile sizegbx_rst_maketiles- Build tiles from grid specgbx_rst_tooverlappingtiles- Overlapping tile gridgbx_rst_h3_tessellate- Tessellate raster into H3 cells
H3 grid aggregation
gbx_rst_h3_rastertogridavg- Average raster values per H3 cellgbx_rst_h3_rastertogridcount- Pixel count per H3 cellgbx_rst_h3_rastertogridmax,gbx_rst_h3_rastertogridmin,gbx_rst_h3_rastertogridmedian- Min/max/median per H3 cell
Aggregations
gbx_rst_combineavg_agg- Average multiple rasters (aggregate)gbx_rst_merge_agg- Merge rasters with aggregationgbx_rst_derivedband_agg- Derived band aggregate
Usage Examples
Python/PySpark
from databricks.labs.gbx.rasterx import functions as rx
# Sample data path (see Sample Data guide; use your Volume path if different)
raster_path = SAMPLE_RASTER_PATH
rx.register(spark)
raster_df = spark.read.format("gdal").load(raster_path)
metadata_df = raster_df.select(
"source",
rx.rst_width("tile").alias("width"),
rx.rst_height("tile").alias("height"),
rx.rst_numbands("tile").alias("bands"),
rx.rst_srid("tile").alias("srid"),
)
metadata_df.show()
Example output
+--------------------+-----+------+-----+----+
|source |width|height|bands|srid|
+--------------------+-----+------+-----+----+
|.../nyc_sentinel2...|10980|10980 |1 |4326|
+--------------------+-----+------+-----+----+
Scala
import com.databricks.labs.gbx.rasterx.{functions => rx}
import org.apache.spark.sql.functions._
// Register functions
rx.register(spark)
// Read raster files (sample data path; see Sample Data guide)
val rasterPath = "/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif"
val rasterDf = spark.read.format("gdal").load(rasterPath)
// Get metadata
val metadataDf = rasterDf.select(
col("path"),
rx.rst_width(col("tile")).alias("width"),
rx.rst_height(col("tile")).alias("height"),
rx.rst_numbands(col("tile")).alias("num_bands")
)
metadataDf.show()
Example output
+--------------------+-----+------+----------+
|path |width|height|num_bands |
+--------------------+-----+------+----------+
|.../nyc_sentinel2...|10980|10980 |1 |
+--------------------+-----+------+----------+
SQL
SQL_RASTERX_USAGE = f"""-- Register functions first in Python/Scala notebook
-- Then use in SQL
-- Read raster data (sample data path; see Sample Data guide)
CREATE OR REPLACE TEMP VIEW rasters AS
SELECT * FROM gdal.`{SAMPLE_RASTER_PATH}`;
-- Extract metadata
SELECT
path,
gbx_rst_width(tile) as width,
gbx_rst_height(tile) as height,
gbx_rst_numbands(tile) as num_bands,
gbx_rst_srid(tile) as srid
FROM rasters;"""
Example output
+--------------------+-----+------+----------+----+
|path |width|height|num_bands |srid|
+--------------------+-----+------+----------+----+
|.../nyc_sentinel2...|10980|10980 |1 |4326|
+--------------------+-----+------+----------+----+