Beta Release Notes
The changes on this page are relative to 0.1.0 (and earlier).
This page tracks API and naming changes since the GeoBrix project started. After the project is approved, formal release notes will take over; until then, use this as the single place to look up what changed and why.
What's new in v0.3.0
Released 2026-05-19. Per-version highlights; full migration tables are in the per-component sections below.
rst_clipCRS axis-order fix (all-black clips). GDAL 3+ defaults EPSG-importedSpatialReferences to authority-compliant axis order (lat/lon for EPSG:4326), which silently swapped axes against JTS/Databricks WKT/WKB cutlines so the clip missed the raster entirely. The reprojection now clones the source/destinationSpatialReferences and forcesOAMS_TRADITIONAL_GIS_ORDERbefore the OGR transform; caller-ownedSpatialReferences are not mutated.- EWKT / EWKB support for
rst_clip.JTS.fromWKT/JTS.fromWKBauto-detect EWKT/EWKB; newJTS.toEWKT/JTS.toEWKBhelpers emit SRID-preserving forms.rst_clipreprojects the cutline when its SRID differs from the raster CRS, and falls back to the raster's CRS (Mosaic-compatible) when the SRID is0/ unresolvable. rst_transformrejects invalid SRIDs.targetSrid <= 0and unresolvable EPSG codes now surface a clear error via tile metadataerror_messageinstead of returning a raster with an uninitialized CRS./vsimem/path-handling hardening.rst_memsize/rst_unlink/ GDAL writer in-memory byte fetch now usestartsWith("/vsimem/")(notcontains) and null-checkGetMemFileBuffer, so datasets whose description embeds the substring (e.g. NetCDF subdataset selectors) aren't mis-routed through the in-memory branch.- Scalar args without
f.lit(...). Python wrappers auto-wrapbool/int/float/bytes; Scala adds typed overloads. SQL was already natively-typed. String literals still wrap inf.lit(...)per pyspark's column-ref convention. Details and migration examples in Scalar values vslit(...)wrapping. - Example notebooks — EO Series, xView, and enablement diagrams. New end-to-end walkthroughs under
docs/examples/covering EO time-series, xView object-detection rasters, and RasterX architecture diagrams. - Supply-chain hardening (lockdown). Jobs pinned to the Databricks-hardened runner group (org-level allowlist, ephemeral VMs, constrained secret access); every Maven dependency, transitive dep, plugin, and plugin dependency is PGP-verified against
.maven-keys.listbefore any compile or test execution; pip and Maven routed through JFrog with OIDC; init script + pinned package versions vetted; new Security page in the docs.
Conventions:
- baseline — Name or behavior before the change (what to search for in old code or docs).
- Notes — Short reason (e.g. standardize across languages, underscore standardization, _geometry → _geom).
General
| Baseline | Current | Notes |
|---|---|---|
Python import geobrix.* | databricks.labs.gbx.* | Match Scala package and published artifact; avoid namespace clashes. |
Extra underscores in function names (multi-word parts spelled with _) | Single underscore between prefix and compound (e.g. rst_pixelwidth, gbx_bng_cellarea) | Underscore standardization: one leading prefix, then one compound word; no _ inside the operation name. |
Non-Column value args required f.lit(...) / lit(...) wrapping (e.g. rst_clip(tile, geom, f.lit(True)), bng_pointascell(pt, f.lit(1))) | Plain Python/Scala non-string scalars accepted directly (e.g. rst_clip(tile, geom, True), rst_transform(tile, 4326), bng_pointascell(pt, 1)) | Matches Mosaic/DBR built-in ergonomics for booleans/numerics. Python wrappers auto-wrap bool/int/float/bytes via f.lit; Scala adds typed overloads. Strings still follow pyspark's column-ref convention — rx.rst_width("tile") is still f.col("tile"); wrap in f.lit(...) for string literals (e.g. driver=f.lit("GTiff")). |
All specific function renames from that standardization are listed in the component tables below.
RasterX
| Baseline | Current | Notes |
|---|---|---|
(GDAL reader output column) path | source | Docs/tests aligned to GDAL reader output column name. |
rst_band_metadata / gbx_rst_band_metadata | rst_bandmetadata / gbx_rst_bandmetadata | Underscore standardization. |
rst_bounding_box / gbx_rst_bounding_box | rst_boundingbox / gbx_rst_boundingbox | Underscore standardization. |
rst_pixel_width / gbx_rst_pixel_width | rst_pixelwidth / gbx_rst_pixelwidth | Underscore standardization. |
rst_pixel_height / gbx_rst_pixel_height | rst_pixelheight / gbx_rst_pixelheight | Underscore standardization. |
rst_num_bands / gbx_rst_num_bands | rst_numbands / gbx_rst_numbands | Underscore standardization. |
rst_pixel_count / gbx_rst_pixel_count | rst_pixelcount / gbx_rst_pixelcount | Underscore standardization. |
rst_scale_x / gbx_rst_scale_x | rst_scalex / gbx_rst_scalex | Underscore standardization. |
rst_scale_y / gbx_rst_scale_y | rst_scaley / gbx_rst_scaley | Underscore standardization. |
rst_upper_left_x / gbx_rst_upper_left_x | rst_upperleftx / gbx_rst_upperleftx | Underscore standardization. |
rst_upper_left_y / gbx_rst_upper_left_y | rst_upperlefty / gbx_rst_upperlefty | Underscore standardization. |
rst_geo_reference / gbx_rst_geo_reference | rst_georeference / gbx_rst_georeference | Underscore standardization. |
rst_get_nodata / gbx_rst_get_nodata | rst_getnodata / gbx_rst_getnodata | Underscore standardization. |
rst_get_subdataset / gbx_rst_get_subdataset | rst_getsubdataset / gbx_rst_getsubdataset | Underscore standardization. |
rst_mem_size / gbx_rst_mem_size | rst_memsize / gbx_rst_memsize | Underscore standardization. |
rst_sub_datasets / gbx_rst_sub_datasets | rst_subdatasets / gbx_rst_subdatasets | Underscore standardization. |
rst_combine_avg_agg / gbx_rst_combine_avg_agg | rst_combineavg_agg / gbx_rst_combineavg_agg | Underscore standardization. |
rst_derived_band_agg / gbx_rst_derived_band_agg | rst_derivedband_agg / gbx_rst_derivedband_agg | Underscore standardization. |
rst_from_content / gbx_rst_from_content | rst_fromcontent / gbx_rst_fromcontent | Underscore standardization. |
rst_from_file / gbx_rst_from_file | rst_fromfile / gbx_rst_fromfile | Underscore standardization. |
rst_from_bands / gbx_rst_from_bands | rst_frombands / gbx_rst_frombands | Underscore standardization. |
rst_make_tiles / gbx_rst_make_tiles | rst_maketiles / gbx_rst_maketiles | Underscore standardization. |
rst_re_tile / gbx_rst_re_tile | rst_retile / gbx_rst_retile | Underscore standardization. |
rst_separate_bands / gbx_rst_separate_bands | rst_separatebands / gbx_rst_separatebands | Underscore standardization. |
rst_to_overlapping_tiles / gbx_rst_to_overlapping_tiles | rst_tooverlappingtiles / gbx_rst_tooverlappingtiles | Underscore standardization. |
rst_init_nodata / gbx_rst_init_nodata | rst_initnodata / gbx_rst_initnodata | Underscore standardization. |
rst_is_empty / gbx_rst_is_empty | rst_isempty / gbx_rst_isempty | Underscore standardization. |
rst_map_algebra / gbx_rst_map_algebra | rst_mapalgebra / gbx_rst_mapalgebra | Underscore standardization. |
rst_raster_to_world_coord / gbx_rst_raster_to_world_coord (and X/Y variants) | rst_rastertoworldcoord / gbx_rst_rastertoworldcoord (and X/Y) | Underscore standardization. |
rst_world_to_raster_coord / gbx_rst_world_to_raster_coord (and X/Y variants) | rst_worldtorastercoord / gbx_rst_worldtorastercoord (and X/Y) | Underscore standardization. |
rst_as_format / gbx_rst_as_format | rst_asformat / gbx_rst_asformat | Underscore standardization. |
rst_combine_avg / gbx_rst_combine_avg | rst_combineavg / gbx_rst_combineavg | Underscore standardization. |
rst_h3_raster_to_grid_avg (and Count/Max/Min/Median) | rst_h3_rastertogridavg (and Count/Max/Min/Median) | Underscore standardization. |
rst_bandmetadata(tile) (single arg) | rst_bandmetadata(tile, band) | Required band parameter added; use e.g. rst_bandmetadata("tile", f.lit(1)). |
rst_fromfile raster field was StringType (path) with metadata.size = -1 | rst_fromfile raster field is BinaryType (file bytes) with real metadata.size | rst_fromfile now reads the file into the tile, so tiles are self-contained and downstream ops (e.g. rst_clip) no longer produce orphan temp paths. Matches rst_fromcontent and the GDAL reader. |
Default output compression was ZSTD (TIFF tag 50000) | Default output compression is DEFLATE (baseline TIFF) | ZSTD output was not decodable by Java ImageIO and broke the Databricks image preview after operators like rst_clip. DEFLATE is universally previewable and (with PREDICTOR=2/3) still compresses well. Override per-call via tile metadata compression key. |
GridX (BNG)
| Baseline | Current | Notes |
|---|---|---|
bng_eastnortasbng (Python) / gbx_bng_eastnortasbng (SQL) | bng_eastnorthasbng / gbx_bng_eastnorthasbng | Standardize across languages (Python had typo; Scala already eastnorth). |
bng_cell_area / gbx_bng_cell_area | bng_cellarea / gbx_bng_cellarea | Underscore standardization. |
bng_cell_intersection / gbx_bng_cell_intersection | bng_cellintersection / gbx_bng_cellintersection | Underscore standardization. |
bng_cell_union / gbx_bng_cell_union | bng_cellunion / gbx_bng_cellunion | Underscore standardization. |
bng_euclidean_distance / gbx_bng_euclidean_distance | bng_euclideandistance / gbx_bng_euclideandistance | Underscore standardization. |
bng_point_as_bng / gbx_bng_point_as_bng | bng_pointascell / gbx_bng_pointascell | Underscore standardization; Renamed for clarity: point → cell (not "point as BNG"). |
bng_cell_intersection_agg / gbx_bng_cell_intersection_agg | bng_cellintersection_agg / gbx_bng_cellintersection_agg | Underscore standardization. |
bng_cell_union_agg / gbx_bng_cell_union_agg | bng_cellunion_agg / gbx_bng_cellunion_agg | Underscore standardization. |
bng_geometry_kring / gbx_bng_geometry_kring | bng_geomkring / gbx_bng_geomkring | _geometry → _geom in name. |
bng_geometry_kloop / gbx_bng_geometry_kloop | bng_geomkloop / gbx_bng_geomkloop | _geometry → _geom in name. |
bng_geometry_kring_explode / gbx_bng_geometry_kring_explode | bng_geomkringexplode / gbx_bng_geomkringexplode | _geometry → _geom + underscore standardization. |
bng_geometry_kloop_explode / gbx_bng_geometry_kloop_explode | bng_geomkloopexplode / gbx_bng_geomkloopexplode | _geometry → _geom + underscore standardization. |
bng_k_ring / gbx_bng_k_ring | bng_kring / gbx_bng_kring | Underscore standardization. |
bng_k_loop / gbx_bng_k_loop | bng_kloop / gbx_bng_kloop | Underscore standardization. |
bng_k_ring_explode / gbx_bng_k_ring_explode | bng_kringexplode / gbx_bng_kringexplode | Underscore standardization. |
bng_k_loop_explode / gbx_bng_k_loop_explode | bng_kloopexplode / gbx_bng_kloopexplode | Underscore standardization. |
bng_tessellate_explode / gbx_bng_tessellate_explode | bng_tessellateexplode / gbx_bng_tessellateexplode | Underscore standardization. |
VectorX
| Baseline | Current | Notes |
|---|---|---|
(Schema/column) _geometry | _geom | Standardize geometry column suffix across readers and examples. |
st_legacy_as_wkb / gbx_st_legacy_as_wkb | st_legacyaswkb / gbx_st_legacyaswkb | Underscore standardization. |
Readers
| Baseline | Current | Notes |
|---|---|---|
shapefile | shapefile_ogr | Reader namespace: format + engine to avoid conflicts with other Spark extensions. |
geojson | geojson_ogr | Same. |
ogr_gpkg | gpkg_ogr | Same; consistent format_engine order. |
file_gdb | file_gdb_ogr | Same. |
| (none) | gtiff_gdal | New reader: named GDAL reader for GeoTIFF; use instead of gdal with option("driver", "GTiff"). |
Reader renames above are planned for 0.2.0. Beta (0.1.x) may still expose the baseline names in some contexts.
Scalar values vs lit(...) wrapping
Previously, every non-Column argument had to be wrapped in f.lit(...) (Python) or lit(...) (Scala). That was a regression from Mosaic/DBR built-ins, where booleans and numerics can be passed as plain values. In 0.3.0, plain scalars are accepted across Python, Scala, and SQL bindings.
Python — wrappers accept Column or scalar (bool/int/float/bytes); non-string scalars are auto-wrapped with f.lit(...). Strings still follow pyspark's column-reference convention (bare string ≈ f.col(name)); wrap in f.lit("...") to pass a string literal.
# ✅ Before 0.3.0 — required f.lit for every value
rx.rst_clip("tile", "geom", f.lit(True))
rx.rst_transform("tile", f.lit(4326))
bx.bng_pointascell("pt", f.lit(1))
bx.bng_pointascell("pt", f.lit("1km"))
# ✅ 0.3.0 — scalars accepted directly
rx.rst_clip("tile", "geom", True)
rx.rst_transform("tile", 4326)
bx.bng_pointascell("pt", 1)
bx.bng_pointascell("pt", f.lit("1km")) # string literal — still wrap in f.lit
Scala — typed overloads added for Boolean / Int / Double / String value parameters. Column args (e.g. geometry, tile) still take Column.
// ✅ 0.3.0 — scalar overloads resolve without lit(...)
rst_clip(col("tile"), col("geom"), cutlineAllTouched = true)
rst_transform(col("tile"), 4326)
bng_pointascell(col("pt"), 1)
bng_pointascell(col("pt"), "1km")
SQL — values are already natively accepted by Spark SQL; no change needed:
SELECT gbx_rst_clip(tile, geom, true) FROM ...;
SELECT gbx_bng_pointascell(pt, 1) FROM ...;
SELECT gbx_bng_pointascell(pt, '1km') FROM ...;
When you still need f.lit(...) in Python:
- String literals:
rx.rst_fromfile(f.lit("/path/to.tif"), f.lit("GTiff"))— a bare string is treated as a column reference. - Nulls / explicit typing: e.g.
f.lit(None).cast("double").
How to use this page
- Migrating code: Search for the baseline name in your code or config; replace with Current and apply any behavior notes.
- Docs or tests: After a change, add one row here so future readers know what changed and why.
- After approval: Move content into formal release notes (e.g. per-version sections) and keep this page for historical beta-only changes, or retire it.
Notable improvements and fixes
- Python package rename: Imports changed from
geobrix.*todatabricks.labs.gbx.*to align with Scala and the published artifact; update all import statements and environment references. - Init script / NumPy: Init script updated to install NumPy 2.x so GDAL Python array operations execute correctly; fixes runtime failures in
gbx_rst_mapalgebraandgbx_rst_ndviwhen used with array-based paths. - Error handling: Functions that previously threw exceptions during execution now surface errors more clearly (e.g. return null or a controlled default with error messages captured) instead of failing with opaque stack traces.
- RasterX
rst_bandmetadata: A requiredbandargument was added; call asrst_bandmetadata(tile, band)(e.g.rst_bandmetadata("tile", f.lit(1))) in Python/SQL/Scala. - GDAL reader column: Raster DataFrames from the GDAL reader use the column name
source(notpath) for the file path; update any code or docs that assumedpath. - BNG aggregators (
bng_cellunion_agg,bng_cellintersection_agg): Fixed a bug where aggregation buffers were shared across partitions (and across tests in the same JVM), causing incorrectcoreflags when running full test suites or with multiple partitions. Each partition now gets a fresh buffer. Chip fields are resolved by type/name in the union aggregator for robustness to struct field order. Test expectation corrected for “all core chips” intersection: result is now correctly documented ascore=true(whole cell). rst_clipaxis-order fix for EPSG-imported CRS (fixes all-black clips): When the clip geometry's CRS was set via an EPSG code (plainrst_transform-style input, EWKTSRID=4326;..., or EWKB with SRID), GDAL 3+ defaults thatSpatialReferenceto authority-compliant axis order — for EPSG:4326 that means(latitude, longitude). JTS / Databricks / most GIS tooling emit WKT/WKB coordinates in traditional(x, y) = (lon, lat)order, so the reprojection insiderst_clipwas silently swapping the axes (e.g.-80 14interpreted aslat=-80, near the south pole) and the cutline missed the raster entirely, producing all-black output.OSRTransformGeometry.transformnow clones both source and destinationSpatialReferences and forcesOAMS_TRADITIONAL_GIS_ORDERon the clones before running the OGR transform, so JTS-origin WKB is interpreted correctly. Caller-ownedSpatialReferences are not mutated.- EWKT / EWKB support for raster clip (CRS mismatch handling):
rst_clipnow accepts EWKT (SRID=<epsg>;<WKT>) and EWKB (PostGIS extended WKB) in addition to plain WKT/WKB. Semantics:- Plain WKT / WKB (no SRID): the geometry is assumed to already be in the raster's CRS; no reprojection is performed.
- EWKT / EWKB (SRID set and resolvable via EPSG): the geometry's CRS is used and, if it differs from the raster's CRS, the cutline is reprojected before clipping.
- If the SRID is
0or not a valid EPSG code, the code falls back to the raster's CRS (same as the plain case) — this restores Mosaic-compatible behavior but no longer silently produces an empty/black clip when a caller forgets to set the SRID.JTS.fromWKT/JTS.fromWKBnow auto-detect EWKT/EWKB; newJTS.toEWKT/JTS.toEWKBhelpers emit SRID-preserving forms. PlaintoWKT/toWKBoutput is unchanged (OGC, no SRID).
rst_transforminvalid SRID:rst_transform(tile, targetSrid)now rejectstargetSrid <= 0and EPSG codes that GDAL cannot resolve with a clear error (surfaced in tile metadataerror_message) instead of returning a raster with an uninitialized CRS./vsimem/path handling hardening:rst_memsize/rst_unlinkand the GDAL writer's in-memory byte fetch now usestartsWith("/vsimem/")(notcontains) and null-checkGetMemFileBuffer, so datasets whose description happens to embed the substring (e.g. NetCDF subdataset selectors) are no longer mis-routed through the in-memory branch.