Beta Release Notes
The changes on this page are relative to 0.1.0 (and earlier).
This page tracks API and naming changes since the GeoBrix project started. After the project is approved, formal release notes will take over; until then, use this as the single place to look up what changed and why.
What's new in v0.4.0
In-flight beta release. Per-version highlights; full migration tables are in the per-component sections below.
- Lightweight execution tier (pyrx, pygx, pyvx). A pure-Python implementation of the GeoBrix API that needs no JAR and no init script, and runs on serverless compute, standard (shared) clusters, Lakeflow declarative pipelines, and ARM. It keeps the same function names and the same
gbx_*SQL afterregister, so switching tiers is a one-line import change. RasterX (pyrx, on rasterio) implements everyrst_*function; GridX (pygx) covers quadbin, BNG, and custom grids; VectorX (pyvx) covers MVT, TIN surface modeling, and legacy-geometry migration. With this release GridX and VectorX are fully both-tier — the lightweight tier reaches 1:1 parity with the heavyweight one across all three packages. See Choosing an Execution Tier.- Serverless support is verified and documented.
geobrix[light]installs and runs on Databricks Serverless (environment v5), standard (shared) clusters, and ARM. Install with the quoted PEP 508 named form —%pip install "geobrix[light] @ file:///Volumes/.../geobrix-0.4.0-py3-none-any.whl"— not the path-with-extra form ('…whl[light]'), which fails on Serverless because%pipwrites the surrounding quotes into the requirement and pip reads[light]as part of the filename.mapbox-vector-tileis pinned to 2.1.x so itsprotobufdependency stays<6(Spark Connect compatibility on Serverless), andidnais pinned<3.8to avoid a core-package-change notice. See Installation. - Geometry inputs accept WKB, EWKB, WKT, and EWKT consistently. Every geometry-accepting function in both tiers now decodes all four encodings through a single shared decoder. Previously some lightweight functions accepted only WKB.
- Geometry×raster operations align to the raster CRS and handle non-overlap gracefully.
gbx_rst_clip,gbx_rst_sample, andgbx_rst_viewshedreproject the input geometry from its SRID to the raster's CRS (matching the heavyweight GDAL behavior), so a geometry in a different CRS clips/samples the correct region. A geometry that does not overlap the raster now returns null / empty instead of raising an error.
- Serverless support is verified and documented.
- Raster reader default changed to no-split (
sizeInMB = -1, behavior change since v0.3.0). Thegdal/gtiff_gdal(heavyweight) andraster_gbx/gtiff_gbx(lightweight) readers now defaultsizeInMBto-1— one whole-image tile per file — instead of auto-splitting large rasters at 16 MB. Set a positivesizeInMBto opt back into tiling for parallel processing of large files. See Raster Readers. - Lightweight raster writer and source-column parity. The lightweight
gtiff_gbxwriter accepts thenameColoption for deterministic output filenames, matching the heavyweight GDAL writer. The lightweight raster reader'ssourcecolumn is nowdbfs:-scheme-qualified to matchbinaryFileand the heavyweight reader, so DataFrames join cleanly across tiers; lightweight file operations strip the scheme internally. gbx_rst_fromfileis lightweight-tier only — registered into SQL on the heavyweight tier whengeobrix[light]is present. On Databricks the executor JVM cannot read a Unity Catalog Volume (/Volumes/...) FUSE path — the UC credential is held only by Spark's user-scoped Python worker — sorst_fromfileis implemented solely in the lightweight tier (apyrxPython loader) and has no heavyweight Scala expression. Withgeobrix[light]installed it is callable from Python (rx.rst_fromfile) and from SQL (gbx_rst_fromfile) regardless of tier: the heavyweight package'sregister(spark)specially registers the SQL name as the Python UDF. Without[light]it is not registered and the Python binding raises with guidance. For a tier-agnostic path on any compute, read the bytes withspark.read.format("binaryFile")and build the tile withgbx_rst_fromcontent. See Raster Functions § Constructors.- Vector tile encoding (
gbx_st_asmvt). First VectorX expression-level function — aggregates features into MVT protobuf bytes for slippy-map publishing. See VectorX § Vector tile output. - Vector tile pyramid (
gbx_st_asmvt_pyramid). Generator function: emits one row per(z, x, y)tile that input geometries intersect, encoded as MVT bytes. Composes withgbx_pmtiles_aggfor end-to-end vector publishing pipelines. Builds ongbx_st_asmvtand shares the same web-mercator tile math asgbx_rst_xyzpyramid. See VectorX § Vector tile output. - Quadbin grid math (10 functions). New
gridx/quadbinsubpackage adds CARTO quadbin v0 support —gbx_quadbin_pointascell,gbx_quadbin_aswkb,gbx_quadbin_centroid,gbx_quadbin_resolution,gbx_quadbin_polyfill,gbx_quadbin_kring,gbx_quadbin_tessellate,gbx_quadbin_cellunion,gbx_quadbin_cellunion_agg,gbx_quadbin_distance. Cell IDs are 64-bit Long; coordinates are EPSG:4326 lon/lat; output geometry is EWKB SRID=4326. Cell encoding matches the CARTO quadbin-py reference implementation (cross-checked at 5 reference points). See GridX § Quadbin. - PMTiles output (
gbx_pmtiles_aggUDAF +.write.format("pmtiles")DataSource). Native Scala PMTiles v3 encoder packages raster (PNG/JPG/WebP) or vector (MVT) tile pyramids into a single deployable blob. Aggregator path for tilesets that fit in a Spark cell (~100 MiB tile payload / 2 GiB cell limit); DataSource for larger pyramids streamed to a file via a partitioned commit protocol. Container is content-agnostic — tile bytes pass through verbatim, no GDAL/OGR dependency. Auto-detects tile type from magic bytes (PNG / JPEG / WebP / otherwise MVT). Read is not yet supported;spark.read.format("pmtiles")raises a friendly error pointing at the JS / Python pmtiles clients. Thegbx_pmtiles_aggaggregate is available in both the heavyweight and lightweight tiers; the.write.format("pmtiles")DataSource (for larger streamed pyramids) remains heavyweight-only. See PMTiles. - Concurrent-safe lightweight writers. The lightweight vector and PMTiles writers now isolate their two-phase staging per write: concurrent jobs — or multiple users — writing to the same output location can no longer see or overwrite one another's in-progress data, and scratch left behind by an interrupted job is reclaimed automatically on a later write to the same location. The PMTiles writer previously staged into a fixed shared directory (a concurrency hazard) and now uses a unique hidden namespace per write. See Writers.
- Raster→quadbin aggregators (5 functions).
gbx_rst_quadbin_rastertogrid{avg,count,max,min,median}extend the H3 aggregation pattern to CARTO quadbin v0 cells. Natural fit for raster heatmaps that render in slippy-map viewers — cells align with the same XYZ pyramid that PMTiles / MVT readers consume. Resolution capped at z=20. See Raster Functions. - Web-mercator XYZ tile output (3 functions).
gbx_rst_to_webmercatorreprojects a raster to EPSG:3857 (defaultbilinear);gbx_rst_tilexyz(tile, z, x, y, [format, size, resampling])renders a single XYZ tile to PNG / JPEG / WEBP bytes (returnsBinaryType; out-of-extent tiles get a transparent PNG, not null);gbx_rst_xyzpyramid(tile, min_z, max_z, ...)is a generator that explodes one raster into one row per intersecting(z, x, y)tile across a zoom range.max_zcapped at 20; total tile-count across zoom range capped at 10^6. Foundation for the PMTiles publishing pipeline. See Raster Functions. - Vector↔raster bridge (
gbx_rst_rasterize,gbx_rst_polygonize). Two reciprocal RasterX functions that span GeoBrix's vector and raster worlds.gbx_rst_rasterize(geom_wkb, value, xmin, ymin, xmax, ymax, width_px, height_px, srid)burns a vector geometry into a fresh GTiff-backed raster tile at the given extent / resolution (pixels inside the geometry carryvalue, pixels outside are NoData =-9999.0).gbx_rst_polygonize(tile, [band, [connectedness]])extractsARRAY<struct(geom_wkb BINARY, value DOUBLE)>fromtile— one feature per contiguous value region, NoData pixels excluded. The pair composes:polygonize(rasterize(geom, v, ...))returns at least one feature with valuevcovering approximately the same area as the inputgeom, with edges quantized to the pixel grid. See Raster Functions § Vector bridge. - Terrain analysis (7 functions).
gbx_rst_slope,gbx_rst_aspect,gbx_rst_hillshade,gbx_rst_tri,gbx_rst_tpi,gbx_rst_roughness,gbx_rst_color_relief— all thin wrappers overgdal.DEMProcessing. Each takes a single-band DEM tile and returns a derived tile (Float32 for slope/aspect/TRI/TPI/roughness, Byte for hillshade, RGB(A) Byte for color_relief). Defaults mirror the gdaldem CLI (hillshade NW sun at 315° azimuth, 45° altitude; slope in degrees). Foundation for terrain-derived workflows — solar exposure, viewshed pre-processing, watershed and runoff analysis, road grading. See Raster Functions § Terrain. - Slope and hillshade auto-scale from the raster CRS (breaking default on geographic rasters).
gbx_rst_slopeandgbx_rst_hillshade(and the lightweightprx.rst_slope/prx.rst_hillshade) now derive the horizontal scale from the raster's coordinate reference system by default, matching GDALgdaldem. On geographic (lat/long, e.g. EPSG:4326) rasters the scale is computed from latitude (degree→metre), so a global or geographic DEM produces correct, non-saturated slope and shading without any extra argument; on projected (metre) rasters output is unchanged. Previously these two ran unscaled on geographic input, which over-steepened and saturated the result. This changes the default output for geographic rasters to the GDAL-consistent value. To pin a specific scale, pass it explicitly —gbx_rst_slope(tile, 'degrees', 111120)for a degree grid, orprx.rst_slope(tile, xscale=..., yscale=...)/prx.rst_hillshade(tile, xscale=..., yscale=...).gbx_rst_aspectis a direction and is unaffected. See Raster Functions § Terrain. - Spectral indices (5 functions).
gbx_rst_evi,gbx_rst_savi,gbx_rst_ndwi,gbx_rst_nbr, plus a genericgbx_rst_index(tile, formula_name, band_map)— all compositions overgbx_rst_mapalgebra. Each takes user-supplied 1-based band indices, builds a per-pixel formula string, and dispatches to gdal_calc; output is a single-band Float32 GTiff sized to the input extent. The generic dispatcher ships built-in NDVI, GNDVI, MSAVI, red-edge NDVI, NDMI, and NDSI formulae and is the entry point users should reach for first for any named multi-band index; the four specialized expressions surface EVI / SAVI / NDWI / NBR with their canonical coefficient defaults (EVI:L=1.0, C1=6.0, C2=7.5, G=2.5per MODIS; SAVI:L=0.5) so vegetation, water and burn-severity workflows compose without a hand-written formula string. See Raster Functions § Spectral indices. - Resample and IDW interpolation (5 functions). Three resample wrappers (
gbx_rst_resampleby multiplicative factor,gbx_rst_resample_to_sizeto explicit pixel dims,gbx_rst_resample_to_resto explicit ground resolution) all delegate togdal.Warpwith-tr/-tsplus-r <algorithm>. Two IDW functions —gbx_rst_gridfrompoints(arrays in one row) and its UDAF counterpartgbx_rst_gridfrompoints_agg(one point per row) — both delegate togdal.Gridwith theinvdist:power=<p>:max_points=<m>algorithm and produce a single-band Float64 GTiff tile of the requested extent / size / SRID. Algorithm names match thegdalwarp -rset (near,bilinear,cubic,cubicspline,lanczos,average,mode,max,min,med,q1,q3); IDW defaults arepower=2.0,max_pts=12, NoData-9999.0. See Raster Functions. - Pixel ops + extraction (7 functions).
gbx_rst_fillnodata(fill NoData holes via inverse-distance from valid neighbors),gbx_rst_sample(tile, geom)(per-band pixel values at a geometry),gbx_rst_setsrid(stamp an EPSG code without reprojecting),gbx_rst_histogram(per-band bucket counts viaband.GetHistogram),gbx_rst_threshold(tile, op, value)(binarize 0/1 via map-algebra),gbx_rst_buildoverviews(tile, levels, [resampling])(add pyramid overview levels), andgbx_rst_band(tile, bandIndex)(extract a single band). Common per-pixel and per-tile operations missing from v0.3.0; each is a thin wrapper over the matching GDAL primitive. See Raster Functions. - Analysis (4 functions).
gbx_rst_cog_convert(tile, [compression, [blocksize, [overview_resampling]]])re-layouts a tile as a Cloud Optimized GeoTIFF viagdal.Translate -of COG(HTTP-range-friendly serving from object storage).gbx_rst_proximity(tile, [target_values, [distunits, [max_distance]]])computes a Float32 distance raster viagdal.ComputeProximity— distance to the nearest non-NoData (or matchingtarget_values) source pixel, in CRS units or pixels.gbx_rst_contour(tile, levels, [interval, [base, [attr_field]]])extracts contour LineStrings viagdal.ContourGenerateEx, returningARRAY<struct(geom_wkb BINARY, value DOUBLE)>— pass non-emptylevelsfor fixed values orarray()plus positiveintervalfor equal-step contours.gbx_rst_viewshed(tile, observer_geom, observer_height, [target_height, [max_distance]])computes a binary visibility mask (Byte raster,255visible /0invisible) from a DEM and an observer POINT viagdal.ViewshedGenerate. See Raster Functions. - TIN DTM rasters (2 functions).
gbx_rst_dtmfromgeoms(array of Z-valued points and optional breaklines in one row) andgbx_rst_dtmfromgeoms_agg(streaming — one point per row, grouped by extent). Both build a constrained-Delaunay TIN and rasterize it to a Float64 GTiff DTM over a bbox at a pixel grid; cells outside the triangulated hull get NoData. Useful for deriving a continuous elevation surface from scattered survey points or LiDAR mass points. See Raster Functions § Constructors. - VectorX TIN surface modeling (3 functions).
gbx_st_triangulate(emit one triangle polygon per row from a constrained-Delaunay TIN),gbx_st_interpolateelevationbbox(sample the TIN on a pixel grid over an explicit bounding box), andgbx_st_interpolateelevationgeom(sample on a grid anchored to a geometry's bounding box with explicit cell sizes) — all generators returning WKB geometries. Useful for exposing the raw triangulation and interpolated elevation points for vector-side workflows. See VectorX § Triangulation and elevation. - Streaming aggregators (3 functions).
gbx_rst_rasterize_agg(burn geom/value pairs into one tile per group),gbx_rst_frombands_agg(collect ordered per-band tiles into one multi-band tile per group), andgbx_quadbin_cellunion_agg(dissolve a column of quadbin cell IDs into one MultiPolygon per group). Group-by / UDAF forms that stream rows instead of requiring a pre-collected array, suited for large partitions. See Raster Functions § Aggregators and GridX § Quadbin. - H3 cell rasterizer (
gbx_rst_h3_rasterize_agg,gbx_h3_cell_bbox).gbx_rst_h3_rasterize_aggis a grouped aggregator (both tiers) that burns a set of H3 cells — one row per cell with an optional value — into a single GTiff-encoded raster tile per group, using pixel-centroid assignment. It is the inverse of thegbx_rst_h3_rastertogrid*family: where those extract per-cell statistics from an existing raster, this one synthesizes a raster from H3-indexed values. Extent and grid dimensions are either supplied explicitly or derived automatically from the cell set.gbx_h3_cell_bboxis a scalar function that returns aSTRUCT<xmin DOUBLE, ymin DOUBLE, xmax DOUBLE, ymax DOUBLE>bounding box for a single H3 cell in the requested EPSG, optionally expanded by a k-ring pad. The lightweight Python API also shipsrst_h3_gridspec, a helper that derives the canonical raster extent and pixel grid from a collection of H3 cells at a given resolution — useful for computing consistent grid parameters before calling the aggregator. See Raster Functions § H3 grid. - Custom grids (7 functions).
gbx_custom_grid(define a user-specified regular grid from extent + resolution + SRID),gbx_custom_pointascell,gbx_custom_cellaswkb,gbx_custom_cellaswkt,gbx_custom_centroid,gbx_custom_polyfill,gbx_custom_kring. Index and tessellate against an arbitrary projected grid (for example a national or project-specific tiling) when H3, BNG, or quadbin cells do not match the required cell geometry. Available in both the heavyweight and lightweight (pygx) tiers, with exact cross-tier cell-ID and cell-set parity. See GridX § Custom Grid Functions. gbx_rst_initnodatanow works on multi-band rasters (behavior change since v0.3.0). Initializing NoData on a raster with more than one band previously raised an error; only single-band rasters were supported.gbx_rst_initnodatanow initializes the NoData value correctly across all bands of a multi-band raster. Output for single-band rasters is unchanged. See Raster Functions.gbx_rst_derivedband/gbx_rst_derivedband_aggreturn a single derived band for multi-band inputs (behavior change since v0.3.0). On a multi-band input, these functions previously returned one derived band per input band (an N-band output). They now apply the pixel function across all bands and return a single-band Float64 result, matching the documented single-band contract. Output for single-band inputs is unchanged. See Raster Functions.gbx_bng_geomkring/gbx_bng_geomkloopaccept string resolutions (consistency fix). These two functions now accept BNG string resolution keys (for example'1km','100m') in addition to integer indices, matchinggbx_bng_pointascelland the lightweight tier. Integer-index behavior is unchanged. See GridX § BNG.- Lightweight grouped aggregators return
BINARYwhere the heavyweight tier returns a struct. For grouped aggregators whose heavyweight form returns a tile or chip struct (therst_*_aggfamily,gbx_bng_cellunion_agg/gbx_bng_cellintersection_agg, andgbx_quadbin_cellunion_agg), the lightweight SQL form returns the serializedBINARYpayload instead — a PySpark limitation (a groupedpandas_udfcannot return a struct type). Re-wrap the result with the matching scalar constructor to recover the struct. The Python DataFrame and Scala APIs are unaffected. See the per-function notes in Raster Functions and GridX. gbx_custom_pointascellrejects a non-finite Y coordinate (fix). A NaN northing was previously not validated (a duplicate easting check), so it was only incidentally rejected with a misleading out-of-bounds message. Both tiers now reject a NaN Y with a clear error, matching the X-coordinate guard.- Lightweight STAC client (
databricks.labs.gbx.stac.StacClient). A Serverless-safe client for distributed SpatioTemporal Asset Catalog (STAC) workflows —search(fan an area-of-interest DataFrame out across a catalog, one row per item/asset),download(resilient, validated asset fetch: re-signs each attempt, read-validates the bytes, retries with backoff, and skips already-valid files), andrepair(re-download only the invalid rows via a Delta MERGE). Catalog-agnostic with pluggable signing, defaulting to Microsoft Planetary Computer. Ships behind the opt-ingeobrix[light,stac]extra (addspystac-client,planetary-computer) and imports cleanly on Serverless environment v5. See STAC Client. - AOI-driven sample downloaders (
databricks.labs.gbx.sample). Three helpers stage open geospatial data to a Unity Catalog Volume with a shared discover → download → read shape, distributed and Serverless-safe:OvertureClient(Overture Maps buildings / places, via the Overture STAC catalog +overturemapsCLI),NaipDownloader(NAIP aerial imagery), andDemDownloader(USGS 3DEP elevation).NaipDownloaderandDemDownloaderwrapStacClienton Microsoft Planetary Computer and window each asset to the AOI on read. See Overture, NAIP, and 3DEP. - Visualization helpers (
databricks.labs.gbx.vizx). A tier-agnostic, opt-in (geobrix[vizx]) module for inspecting GeoBrix outputs in a notebook.plot_raster/plot_filerender a tile or file (auto-decimate, percentile-stretch, single-band viridis or multi-band RGB), and acceptcomposite="depth"to render a multi-band presence stack as a per-pixel coverage-depth gradient (bright where many bands cover a pixel) instead of a mostly-black RGB.plot_mask_layersoverlays several single-band mask tiles on one axes — each a solid colour with a legend — for multi-threshold coverage views.as_gdf/cells_as_gdf/grid_as_gdfadapt Spark DataFrames (geometry rows, H3 cell ids with an optionaldissolve_by, or arst_h3_gridspecgrid struct) to GeoPandas for.plot()/.explore()maps.plot_staticrenders Spark- or GeoPandas-derived geometries (or DGGS cells) as a GitHub-renderable matplotlib figure over a basemap, andplot_interactiveis its interactive twin — a folium pan/zoom map that automatically falls back to a raster image overlay at scale (where a bare.explore()would hang) and renders inline in Databricks viadisplayHTML. Single-band presence masks (constant value) now render as a solid footprint over a light background rather than a blank plot. See Visualization. - Inline PMTiles + COG viewers (
plot_pmtiles,plot_cog,pmtiles_info).plot_pmtilesrenders a PMTiles archive (raster or vector, auto-detected from the header) directly in a notebook — a self-contained MapLibre GL JS + pmtiles.js page with the archive base64-embedded as an in-browserFileSource, so there is no tile server; it falls back to a static image when the archive exceeds the notebook cell-output ceiling (~4–5 MB afterdisplayHTMLinflation), or drops the densest zooms withinteractive_fit="downzoom".plot_cogrenders a Cloud-Optimized GeoTIFF over a contextily basemap;pmtiles_inforeports an archive's header (tile type, zoom range, bounds). See PMTiles viewers. - Example notebooks default to the lightweight tier. The EO Series and xView walkthroughs now run on the lightweight API (
pyrx/pygx/pyvxplus thegbx_*DataSource readers and writers) by default, so they execute on Databricks Serverless (environment v5) with no JAR and no init script; each notebook calls out the one-line import to switch back to the heavyweight tier. The EO Series uses the newStacClientfor its Planetary Computer search, download, and repair steps. See EO Series and xView. - H3 cell rasterize example notebook. A complete polygon → H3 polyfill → per-band rasterize → multi-band stack walkthrough on a San Francisco Bay Area DEM, treating elevation isobands as a stand-in for signal-strength coverage tiers (a telco coverage-analysis pattern). Exercises
rst_h3_gridspec,rst_h3_rasterize_agg, andrst_frombands_agg, materializes the per-band tiles into a session-scoped temp table, and uses thegbx.vizxhelpers (plot_mask_layers,plot_raster(composite="depth")) to inspect the result. See H3 Rasterize. - Helios distributed-tiling notebook series. A four-notebook solar site-selection walkthrough over one San Francisco AOI: building footprints → vector PMTiles (NB01), a NAIP aerial basemap → raster PMTiles (NB02), 3DEP terrain → COG catalog + hillshade PMTiles + a per-H3-cell solar score (NB03), and a distributed sharded PMTiles mosaic with a
mosaic.jsonmanifest for client-side assembly (NB04). Runs on the lightweight tier / Serverless with no JAR, dogfoodinggbx_st_asmvt_pyramid,gbx_rst_xyzpyramid,gbx_pmtiles_agg, the sample downloaders, and thegbx.vizxPMTiles viewers. See Helios.
What's new in v0.3.0
Released 2026-05-26. Per-version highlights; full migration tables are in the per-component sections below.
rst_clipCRS axis-order fix (all-black clips). GDAL 3+ defaults EPSG-importedSpatialReferences to authority-compliant axis order (lat/lon for EPSG:4326), which silently swapped axes against JTS/Databricks WKT/WKB cutlines so the clip missed the raster entirely. The reprojection now clones the source/destinationSpatialReferences and forcesOAMS_TRADITIONAL_GIS_ORDERbefore the OGR transform; caller-ownedSpatialReferences are not mutated.- EWKT / EWKB support for
rst_clip.JTS.fromWKT/JTS.fromWKBauto-detect EWKT/EWKB; newJTS.toEWKT/JTS.toEWKBhelpers emit SRID-preserving forms.rst_clipreprojects the cutline when its SRID differs from the raster CRS, and falls back to the raster's CRS (Mosaic-compatible) when the SRID is0/ unresolvable. rst_transformrejects invalid SRIDs.targetSrid <= 0and unresolvable EPSG codes now surface a clear error via tile metadataerror_messageinstead of returning a raster with an uninitialized CRS./vsimem/path-handling hardening.rst_memsize/rst_unlink/ GDAL writer in-memory byte fetch now usestartsWith("/vsimem/")(notcontains) and null-checkGetMemFileBuffer, so datasets whose description embeds the substring (e.g. NetCDF subdataset selectors) aren't mis-routed through the in-memory branch.tile.rasterbytes are always self-contained (no VRT payloads). Three RasterX operations —MergeRasters(gbx_rst_merge,gbx_rst_merge_agg),MergeBands(gbx_rst_frombands), andPixelCombineRasters(gbx_rst_derivedband,gbx_rst_derivedband_agg,gbx_rst_combineavg,gbx_rst_combineavg_agg) — used to return tiles whosemetadata("driver")claimedVRTeven though the on-disk file was a materialized GTiff. That mis-tag propagated throughRasterDriver.writeToBytes(which keys both the tempfile extension AND the-offlag in the innergdal_translatecall offmetadata.driver), causing the serializedtile.rasterpayload to be VRT XML referencing a/vsimem/tempfile only reachable on the producing executor. Single-node testing passed by accident; multi-executor clusters hitfile not foundwhen the VRT was opened elsewhere. Fix:GDALTranslate.executeTranslatenow records the output dataset's driver in its returned metadata (not the input's), andRasterDriver.writeToBytesdefensively coerces VRT to GTiff on serialization + sniffs the result to refuse shipping VRT bytes. Regression coverage inRST_NoVrtPayloadTest.PixelCombineRasterspixel function now actually fires (combineavg/derivedbandwere silently returning one of the inputs).gbx_rst_combineavg,gbx_rst_combineavg_agg,gbx_rst_derivedband, andgbx_rst_derivedband_aggbuild a multi-source VRT, inject a<PixelFunctionLanguage>Python</...>band, and re-open it forgdal_translate. The previous implementation re-opened the VRT before mutating the XML file, so the in-memoryDatasethandle never saw the pixel function;gdal.Translatethen fell back to a default multi-source mosaic (last-source-wins per pixel). On co-extensive inputs (e.g. a monthly EO time-series), the output silently equaled one of the inputs — non-deterministic per partition in a distributed setting, producing visible tile-of-different-years patchwork on multi-executor clusters. Fix:PixelCombineRasters.combinenow injects the pixel function before the VRT is re-opened, and pre-creates the per-JVMNodeFilePathUtil.rootPathstaging dir itself (previously onlyClipToGeomdid, socombineavgwouldfile not foundif it was the first op to hit a fresh JVM). Regression coverage:RST_AggregationsTest"CombineAvg actually averages pixel values" (two constant rasters 50 + 100 → output 75).gbx_rst_merge_aggoverlap winner is now deterministic. When merging tiles whose extents overlap, the mosaic is last-wins, so the result depends on the order tiles are folded. The aggregator previously ordered tiles by their GDAL dataset description to make that order stable, but for the in-memory (BinaryType) tiles agroupBy().agg()produces, the description is a per-open/vsimem/<uuid>path — so the fold order, and therefore the overlap winner, varied from run to run. The aggregator now orders tiles by their raw serialized content (the GTiff bytes each tile carries) — a total order intrinsic to the tile with no ties for distinct content and no random per-open component — so one tile reliably wins the overlap regardless of fold order, and the result is identical across the heavyweight and lightweight tiers (both sort on the identical bytes). This also fixes overlapping tiles that share the same geotransform origin, which an origin-based key could not separate. Non-overlapping mosaics are unaffected. Regression coverage:RST_AggEvalTestdeterministic same-origin and offset merge cases.- Friendly error on
ARRAY<tile>-function misuse. Callinggbx_rst_combineavg,gbx_rst_merge,gbx_rst_frombands, orgbx_rst_mapalgebraon a single tile column (instead of anARRAY<tile>likecollect_list(tile)) used to surface as a rawClassCastException: StructType cannot be cast to ArrayTypefrom inside Catalyst analysis — untraceable from a notebook. The four expressions now route throughRST_ExpressionUtil.arrayOfTileRasterType, which raises a cleanIllegalArgumentExceptionnaming the function, the actual type received, and (where applicable) the aggregator companion the user likely wanted, e.g.gbx_rst_combineavg expects ARRAY<tile> (e.g. collect_list(tile) or array(t1, t2, ...)), but received STRUCT<...>. To aggregate the column across rows, use gbx_rst_combineavg_agg(tile). - Docs:
GDAL_VRT_ENABLE_PYTHONfor custom GDAL code paths. Built-incombineavg/derivedbandcalls auto-enable VRT Python via the in-processGDALManager.withVrtPythonbracket — no cluster config needed. The RasterX Function Reference § VRT Python pixel functions section documents how to enable the same evaluation in your own GDAL calls (Pythongdal.SetConfigOption, clusterspark.executorEnv, or the JVMwithVrtPythonhelper) and points to theTRUSTED_MODULESvariant for less-trusted VRT sources. A cross-reference is added in Security § 6 explaining why GeoBrix ships the optionNOby default. gbx_rst_derivedband/gbx_rst_derivedband_aggnumerical-correctness regression coverage. These functions share thePixelCombineRasterscode path withcombineavg, so they were silently no-opping in the same way (returning one of the inputs unchanged on co-extensive stacks). The ordering fix above repairs both call sites, but the existing tests only checked that the result wasn't null — they would have passed either way. This release adds explicit pixel-value assertions:RST_AggregationsTestcovers the in-processRST_DerivedBandpath with a doubling pyfunc and a 3-input numpy-mean pyfunc, andRST_AggEvalTestcovers the Spark-aggregationrst_derivedband_aggpath end-to-end (three constant-Byte tiles 10/20/30 with a "mean × 2" pyfunc must yield 40 across the result tile). Two previously-passing tests useddef myfunc(x): return x * 2— an invalid VRT pixel-function signature — and were updated to the canonical(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs)shape; they only "passed" before because the pyfunc never actually ran.gbx_rst_combineavg/gbx_rst_combineavg_aggmath corrected (NoData, valid zeros, rounding). With the pixel function now firing (previous bullet), several latent bugs in the average kernel surface and are fixed in this release. The pyfunc used to sum every source value blindly — including each band's NoData sentinel (e.g. 255 on Byte EO products) — and counted only strictly-positive cells in the divisor (np.sum(stacked > 0, axis=0)), which (a) inflated the numerator with NoData and (b) wrongly excluded valid0measurements from the divisor. It also usednp.divide(..., casting='unsafe'), which truncates rather than rounds when casting back to an integer output dtype (Byte / UInt16), producing systematic underbias on integer EO stacks. Now the kernel reads each source band's declared NoData (viaBandAccessors.getNoDataValue, baked into the pyfunc source as a literal list at VRT-write time), masks NoData cells out of both sum and divisor, includes valid0s, uses float64 internally, and rounds-to-nearest-even before the unsafe cast when the output dtype is integer. The bogusnp.clip(out_ar, stacked.min(), stacked.max(), ...)(the bounds were contaminated by NoData sentinels) is removed. When at least one input declares NoData, that value is also stamped on the output band so downstreamGetNoDataValuereports all-NoData pixels. Regression coverage inRST_AggregationsTest: "excludes declared NoData from both sum and divisor", "counts valid 0 cells in the divisor", "rounds (not truncates) when casting to integer output".- Scalar args without
f.lit(...). Python wrappers auto-wrapbool/int/float/bytes; Scala adds typed overloads. SQL was already natively-typed. String literals still wrap inf.lit(...)per pyspark's column-ref convention. Details and migration examples in Scalar values vslit(...)wrapping. - Example notebooks — EO Series, xView, and enablement diagrams. New end-to-end walkthroughs under
docs/examples/covering EO time-series, xView object-detection rasters, and RasterX architecture diagrams. - Supply-chain hardening (lockdown). Jobs pinned to the Databricks-hardened runner group (org-level allowlist, ephemeral VMs, constrained secret access); every Maven dependency, transitive dep, plugin, and plugin dependency is PGP-verified against
.maven-keys.listbefore any compile or test execution; pip and Maven routed through JFrog with OIDC; init script + pinned package versions vetted; new Security page in the docs. - Pre-built, hash-verified GDAL bundle. The GDAL native install path is now a CI-built tarball (
geobrix-gdal-artifacts-v<version>-noble.tar.gz+ matching.sha256sidecar, attached to each release alongside a versionedgeobrix-gdal-init.sh). Cluster start drops from ~15 minutes (legacy PPA dance per boot) to ~30–90 seconds (verify sidecar → extract →dpkg -i). Trust chain is now four layers: CI-side GPG fingerprint pin → per-fileSHA256SUMSinside the tarball → outer.sha256sidecar in the staging Volume → the Volume's write ACL. The legacy on-cluster path is preserved asscripts/geobrix-gdal-init-ppa.shfor bundle bootstrapping. Bundle isamd64/x86_64only (Intel or AMD CPUs); ARM-based instance types — AWS Graviton, Ampere, Apple Silicon — are not supported. See Installation and the rationale on the Security page.
Conventions:
- baseline — Name or behavior before the change (what to search for in old code or docs).
- Notes — Short reason (e.g. standardize across languages, underscore standardization, _geometry → _geom).
General
| Baseline | Current | Notes |
|---|---|---|
Python import geobrix.* | databricks.labs.gbx.* | Match Scala package and published artifact; avoid namespace clashes. |
Extra underscores in function names (multi-word parts spelled with _) | Single underscore between prefix and compound (e.g. rst_pixelwidth, gbx_bng_cellarea) | Underscore standardization: one leading prefix, then one compound word; no _ inside the operation name. |
Non-Column value args required f.lit(...) / lit(...) wrapping (e.g. rst_clip(tile, geom, f.lit(True)), bng_pointascell(pt, f.lit(1))) | Plain Python/Scala non-string scalars accepted directly (e.g. rst_clip(tile, geom, True), rst_transform(tile, 4326), bng_pointascell(pt, 1)) | Matches Mosaic/DBR built-in ergonomics for booleans/numerics. Python wrappers auto-wrap bool/int/float/bytes via f.lit; Scala adds typed overloads. Strings still follow pyspark's column-ref convention — rx.rst_width("tile") is still f.col("tile"); wrap in f.lit(...) for string literals (e.g. driver=f.lit("GTiff")). |
All specific function renames from that standardization are listed in the component tables below.
RasterX
| Baseline | Current | Notes |
|---|---|---|
(GDAL reader output column) path | source | Docs/tests aligned to GDAL reader output column name. |
rst_band_metadata / gbx_rst_band_metadata | rst_bandmetadata / gbx_rst_bandmetadata | Underscore standardization. |
rst_bounding_box / gbx_rst_bounding_box | rst_boundingbox / gbx_rst_boundingbox | Underscore standardization. |
rst_pixel_width / gbx_rst_pixel_width | rst_pixelwidth / gbx_rst_pixelwidth | Underscore standardization. |
rst_pixel_height / gbx_rst_pixel_height | rst_pixelheight / gbx_rst_pixelheight | Underscore standardization. |
rst_num_bands / gbx_rst_num_bands | rst_numbands / gbx_rst_numbands | Underscore standardization. |
rst_pixel_count / gbx_rst_pixel_count | rst_pixelcount / gbx_rst_pixelcount | Underscore standardization. |
rst_scale_x / gbx_rst_scale_x | rst_scalex / gbx_rst_scalex | Underscore standardization. |
rst_scale_y / gbx_rst_scale_y | rst_scaley / gbx_rst_scaley | Underscore standardization. |
rst_upper_left_x / gbx_rst_upper_left_x | rst_upperleftx / gbx_rst_upperleftx | Underscore standardization. |
rst_upper_left_y / gbx_rst_upper_left_y | rst_upperlefty / gbx_rst_upperlefty | Underscore standardization. |
rst_geo_reference / gbx_rst_geo_reference | rst_georeference / gbx_rst_georeference | Underscore standardization. |
rst_get_nodata / gbx_rst_get_nodata | rst_getnodata / gbx_rst_getnodata | Underscore standardization. |
rst_get_subdataset / gbx_rst_get_subdataset | rst_getsubdataset / gbx_rst_getsubdataset | Underscore standardization. |
rst_mem_size / gbx_rst_mem_size | rst_memsize / gbx_rst_memsize | Underscore standardization. |
rst_sub_datasets / gbx_rst_sub_datasets | rst_subdatasets / gbx_rst_subdatasets | Underscore standardization. |
rst_combine_avg_agg / gbx_rst_combine_avg_agg | rst_combineavg_agg / gbx_rst_combineavg_agg | Underscore standardization. |
rst_derived_band_agg / gbx_rst_derived_band_agg | rst_derivedband_agg / gbx_rst_derivedband_agg | Underscore standardization. |
rst_from_content / gbx_rst_from_content | rst_fromcontent / gbx_rst_fromcontent | Underscore standardization. |
rst_from_file / gbx_rst_from_file | rst_fromfile / gbx_rst_fromfile | Underscore standardization. |
rst_from_bands / gbx_rst_from_bands | rst_frombands / gbx_rst_frombands | Underscore standardization. |
rst_make_tiles / gbx_rst_make_tiles | rst_maketiles / gbx_rst_maketiles | Underscore standardization. |
rst_re_tile / gbx_rst_re_tile | rst_retile / gbx_rst_retile | Underscore standardization. |
rst_separate_bands / gbx_rst_separate_bands | rst_separatebands / gbx_rst_separatebands | Underscore standardization. |
rst_to_overlapping_tiles / gbx_rst_to_overlapping_tiles | rst_tooverlappingtiles / gbx_rst_tooverlappingtiles | Underscore standardization. |
rst_init_nodata / gbx_rst_init_nodata | rst_initnodata / gbx_rst_initnodata | Underscore standardization. |
rst_is_empty / gbx_rst_is_empty | rst_isempty / gbx_rst_isempty | Underscore standardization. |
rst_map_algebra / gbx_rst_map_algebra | rst_mapalgebra / gbx_rst_mapalgebra | Underscore standardization. |
rst_raster_to_world_coord / gbx_rst_raster_to_world_coord (and X/Y variants) | rst_rastertoworldcoord / gbx_rst_rastertoworldcoord (and X/Y) | Underscore standardization. |
rst_world_to_raster_coord / gbx_rst_world_to_raster_coord (and X/Y variants) | rst_worldtorastercoord / gbx_rst_worldtorastercoord (and X/Y) | Underscore standardization. |
rst_as_format / gbx_rst_as_format | rst_asformat / gbx_rst_asformat | Underscore standardization. |
rst_combine_avg / gbx_rst_combine_avg | rst_combineavg / gbx_rst_combineavg | Underscore standardization. |
rst_h3_raster_to_grid_avg (and Count/Max/Min/Median) | rst_h3_rastertogridavg (and Count/Max/Min/Median) | Underscore standardization. |
rst_bandmetadata(tile) (single arg) | rst_bandmetadata(tile, band) | Required band parameter added; use e.g. rst_bandmetadata("tile", f.lit(1)). |
rst_fromfile raster field was StringType (path) with metadata.size = -1 | rst_fromfile raster field is BinaryType (file bytes) with real metadata.size | rst_fromfile now reads the file into the tile, so tiles are self-contained and downstream ops (e.g. rst_clip) no longer produce orphan temp paths. Matches rst_fromcontent and the GDAL reader. |
Default output compression was ZSTD (TIFF tag 50000) | Default output compression is DEFLATE (baseline TIFF) | ZSTD output was not decodable by Java ImageIO and broke the Databricks image preview after operators like rst_clip. DEFLATE is universally previewable and (with PREDICTOR=2/3) still compresses well. Override per-call via tile metadata compression key. |
GridX (BNG)
| Baseline | Current | Notes |
|---|---|---|
bng_eastnortasbng (Python) / gbx_bng_eastnortasbng (SQL) | bng_eastnorthasbng / gbx_bng_eastnorthasbng | Standardize across languages (Python had typo; Scala already eastnorth). |
bng_cell_area / gbx_bng_cell_area | bng_cellarea / gbx_bng_cellarea | Underscore standardization. |
bng_cell_intersection / gbx_bng_cell_intersection | bng_cellintersection / gbx_bng_cellintersection | Underscore standardization. |
bng_cell_union / gbx_bng_cell_union | bng_cellunion / gbx_bng_cellunion | Underscore standardization. |
bng_euclidean_distance / gbx_bng_euclidean_distance | bng_euclideandistance / gbx_bng_euclideandistance | Underscore standardization. |
bng_point_as_bng / gbx_bng_point_as_bng | bng_pointascell / gbx_bng_pointascell | Underscore standardization; Renamed for clarity: point → cell (not "point as BNG"). |
bng_cell_intersection_agg / gbx_bng_cell_intersection_agg | bng_cellintersection_agg / gbx_bng_cellintersection_agg | Underscore standardization. |
bng_cell_union_agg / gbx_bng_cell_union_agg | bng_cellunion_agg / gbx_bng_cellunion_agg | Underscore standardization. |
bng_geometry_kring / gbx_bng_geometry_kring | bng_geomkring / gbx_bng_geomkring | _geometry → _geom in name. |
bng_geometry_kloop / gbx_bng_geometry_kloop | bng_geomkloop / gbx_bng_geomkloop | _geometry → _geom in name. |
bng_geometry_kring_explode / gbx_bng_geometry_kring_explode | bng_geomkringexplode / gbx_bng_geomkringexplode | _geometry → _geom + underscore standardization. |
bng_geometry_kloop_explode / gbx_bng_geometry_kloop_explode | bng_geomkloopexplode / gbx_bng_geomkloopexplode | _geometry → _geom + underscore standardization. |
bng_k_ring / gbx_bng_k_ring | bng_kring / gbx_bng_kring | Underscore standardization. |
bng_k_loop / gbx_bng_k_loop | bng_kloop / gbx_bng_kloop | Underscore standardization. |
bng_k_ring_explode / gbx_bng_k_ring_explode | bng_kringexplode / gbx_bng_kringexplode | Underscore standardization. |
bng_k_loop_explode / gbx_bng_k_loop_explode | bng_kloopexplode / gbx_bng_kloopexplode | Underscore standardization. |
bng_tessellate_explode / gbx_bng_tessellate_explode | bng_tessellateexplode / gbx_bng_tessellateexplode | Underscore standardization. |
VectorX
| Baseline | Current | Notes |
|---|---|---|
(Schema/column) _geometry | _geom | Standardize geometry column suffix across readers and examples. |
st_legacy_as_wkb / gbx_st_legacy_as_wkb | st_legacyaswkb / gbx_st_legacyaswkb | Underscore standardization. |
Readers
| Baseline | Current | Notes |
|---|---|---|
shapefile | shapefile_ogr | Reader namespace: format + engine to avoid conflicts with other Spark extensions. |
geojson | geojson_ogr | Same. |
ogr_gpkg | gpkg_ogr | Same; consistent format_engine order. |
file_gdb | file_gdb_ogr | Same. |
| (none) | gtiff_gdal | New reader: named GDAL reader for GeoTIFF; use instead of gdal with option("driver", "GTiff"). |
Reader renames above are planned for 0.2.0. Beta (0.1.x) may still expose the baseline names in some contexts.
Scalar values vs lit(...) wrapping
Previously, every non-Column argument had to be wrapped in f.lit(...) (Python) or lit(...) (Scala). That was a regression from Mosaic/DBR built-ins, where booleans and numerics can be passed as plain values. In 0.3.0, plain scalars are accepted across Python, Scala, and SQL bindings.
Python — wrappers accept Column or scalar (bool/int/float/bytes); non-string scalars are auto-wrapped with f.lit(...). Strings still follow pyspark's column-reference convention (bare string ≈ f.col(name)); wrap in f.lit("...") to pass a string literal.
# ✅ Before 0.3.0 — required f.lit for every value
rx.rst_clip("tile", "geom", f.lit(True))
rx.rst_transform("tile", f.lit(4326))
bx.bng_pointascell("pt", f.lit(1))
bx.bng_pointascell("pt", f.lit("1km"))
# ✅ 0.3.0 — scalars accepted directly
rx.rst_clip("tile", "geom", True)
rx.rst_transform("tile", 4326)
bx.bng_pointascell("pt", 1)
bx.bng_pointascell("pt", f.lit("1km")) # string literal — still wrap in f.lit
Scala — typed overloads added for Boolean / Int / Double / String value parameters. Column args (e.g. geometry, tile) still take Column.
// ✅ 0.3.0 — scalar overloads resolve without lit(...)
rst_clip(col("tile"), col("geom"), cutlineAllTouched = true)
rst_transform(col("tile"), 4326)
bng_pointascell(col("pt"), 1)
bng_pointascell(col("pt"), "1km")
SQL — values are already natively accepted by Spark SQL; no change needed:
SELECT gbx_rst_clip(tile, geom, true) FROM ...;
SELECT gbx_bng_pointascell(pt, 1) FROM ...;
SELECT gbx_bng_pointascell(pt, '1km') FROM ...;
When you still need f.lit(...) in Python:
- String literals:
rx.rst_fromfile(f.lit("/path/to.tif"), f.lit("GTiff"))— a bare string is treated as a column reference. - Nulls / explicit typing: e.g.
f.lit(None).cast("double").
How to use this page
- Migrating code: Search for the baseline name in your code or config; replace with Current and apply any behavior notes.
- Docs or tests: After a change, add one row here so future readers know what changed and why.
- After approval: Move content into formal release notes (e.g. per-version sections) and keep this page for historical beta-only changes, or retire it.
Notable improvements and fixes
- Python package rename: Imports changed from
geobrix.*todatabricks.labs.gbx.*to align with Scala and the published artifact; update all import statements and environment references. - Init script / NumPy: Init script updated to install NumPy 2.x so GDAL Python array operations execute correctly; fixes runtime failures in
gbx_rst_mapalgebraandgbx_rst_ndviwhen used with array-based paths. - Error handling: Functions that previously threw exceptions during execution now surface errors more clearly (e.g. return null or a controlled default with error messages captured) instead of failing with opaque stack traces.
- RasterX
rst_bandmetadata: A requiredbandargument was added; call asrst_bandmetadata(tile, band)(e.g.rst_bandmetadata("tile", f.lit(1))) in Python/SQL/Scala. - GDAL reader column: Raster DataFrames from the GDAL reader use the column name
source(notpath) for the file path; update any code or docs that assumedpath. - BNG aggregators (
bng_cellunion_agg,bng_cellintersection_agg): Fixed a bug where aggregation buffers were shared across partitions (and across tests in the same JVM), causing incorrectcoreflags when running full test suites or with multiple partitions. Each partition now gets a fresh buffer. Chip fields are resolved by type/name in the union aggregator for robustness to struct field order. Test expectation corrected for “all core chips” intersection: result is now correctly documented ascore=true(whole cell). rst_clipaxis-order fix for EPSG-imported CRS (fixes all-black clips): When the clip geometry's CRS was set via an EPSG code (plainrst_transform-style input, EWKTSRID=4326;..., or EWKB with SRID), GDAL 3+ defaults thatSpatialReferenceto authority-compliant axis order — for EPSG:4326 that means(latitude, longitude). JTS / Databricks / most GIS tooling emit WKT/WKB coordinates in traditional(x, y) = (lon, lat)order, so the reprojection insiderst_clipwas silently swapping the axes (e.g.-80 14interpreted aslat=-80, near the south pole) and the cutline missed the raster entirely, producing all-black output.OSRTransformGeometry.transformnow clones both source and destinationSpatialReferences and forcesOAMS_TRADITIONAL_GIS_ORDERon the clones before running the OGR transform, so JTS-origin WKB is interpreted correctly. Caller-ownedSpatialReferences are not mutated.- EWKT / EWKB support for raster clip (CRS mismatch handling):
rst_clipnow accepts EWKT (SRID=<epsg>;<WKT>) and EWKB (PostGIS extended WKB) in addition to plain WKT/WKB. Semantics:- Plain WKT / WKB (no SRID): the geometry is assumed to already be in the raster's CRS; no reprojection is performed.
- EWKT / EWKB (SRID set and resolvable via EPSG): the geometry's CRS is used and, if it differs from the raster's CRS, the cutline is reprojected before clipping.
- If the SRID is
0or not a valid EPSG code, the code falls back to the raster's CRS (same as the plain case) — this restores Mosaic-compatible behavior but no longer silently produces an empty/black clip when a caller forgets to set the SRID.JTS.fromWKT/JTS.fromWKBnow auto-detect EWKT/EWKB; newJTS.toEWKT/JTS.toEWKBhelpers emit SRID-preserving forms. PlaintoWKT/toWKBoutput is unchanged (OGC, no SRID).
rst_transforminvalid SRID:rst_transform(tile, targetSrid)now rejectstargetSrid <= 0and EPSG codes that GDAL cannot resolve with a clear error (surfaced in tile metadataerror_message) instead of returning a raster with an uninitialized CRS./vsimem/path handling hardening:rst_memsize/rst_unlinkand the GDAL writer's in-memory byte fetch now usestartsWith("/vsimem/")(notcontains) and null-checkGetMemFileBuffer, so datasets whose description happens to embed the substring (e.g. NetCDF subdataset selectors) are no longer mis-routed through the in-memory branch.