Skip to main content

Beta Release Notes

Current version: 0.3.0

The changes on this page are relative to 0.1.0 (and earlier).

This page tracks API and naming changes since the GeoBrix project started. After the project is approved, formal release notes will take over; until then, use this as the single place to look up what changed and why.


What's new in v0.3.0

Released 2026-05-19. Per-version highlights; full migration tables are in the per-component sections below.

  • rst_clip CRS axis-order fix (all-black clips). GDAL 3+ defaults EPSG-imported SpatialReferences to authority-compliant axis order (lat/lon for EPSG:4326), which silently swapped axes against JTS/Databricks WKT/WKB cutlines so the clip missed the raster entirely. The reprojection now clones the source/destination SpatialReferences and forces OAMS_TRADITIONAL_GIS_ORDER before the OGR transform; caller-owned SpatialReferences are not mutated.
  • EWKT / EWKB support for rst_clip. JTS.fromWKT / JTS.fromWKB auto-detect EWKT/EWKB; new JTS.toEWKT / JTS.toEWKB helpers emit SRID-preserving forms. rst_clip reprojects the cutline when its SRID differs from the raster CRS, and falls back to the raster's CRS (Mosaic-compatible) when the SRID is 0 / unresolvable.
  • rst_transform rejects invalid SRIDs. targetSrid <= 0 and unresolvable EPSG codes now surface a clear error via tile metadata error_message instead of returning a raster with an uninitialized CRS.
  • /vsimem/ path-handling hardening. rst_memsize / rst_unlink / GDAL writer in-memory byte fetch now use startsWith("/vsimem/") (not contains) and null-check GetMemFileBuffer, so datasets whose description embeds the substring (e.g. NetCDF subdataset selectors) aren't mis-routed through the in-memory branch.
  • Scalar args without f.lit(...). Python wrappers auto-wrap bool / int / float / bytes; Scala adds typed overloads. SQL was already natively-typed. String literals still wrap in f.lit(...) per pyspark's column-ref convention. Details and migration examples in Scalar values vs lit(...) wrapping.
  • Example notebooks — EO Series, xView, and enablement diagrams. New end-to-end walkthroughs under docs/examples/ covering EO time-series, xView object-detection rasters, and RasterX architecture diagrams.
  • Supply-chain hardening (lockdown). Jobs pinned to the Databricks-hardened runner group (org-level allowlist, ephemeral VMs, constrained secret access); every Maven dependency, transitive dep, plugin, and plugin dependency is PGP-verified against .maven-keys.list before any compile or test execution; pip and Maven routed through JFrog with OIDC; init script + pinned package versions vetted; new Security page in the docs.

Conventions:

  • baseline — Name or behavior before the change (what to search for in old code or docs).
  • Notes — Short reason (e.g. standardize across languages, underscore standardization, _geometry → _geom).

General

BaselineCurrentNotes
Python import geobrix.*databricks.labs.gbx.*Match Scala package and published artifact; avoid namespace clashes.
Extra underscores in function names (multi-word parts spelled with _)Single underscore between prefix and compound (e.g. rst_pixelwidth, gbx_bng_cellarea)Underscore standardization: one leading prefix, then one compound word; no _ inside the operation name.
Non-Column value args required f.lit(...) / lit(...) wrapping (e.g. rst_clip(tile, geom, f.lit(True)), bng_pointascell(pt, f.lit(1)))Plain Python/Scala non-string scalars accepted directly (e.g. rst_clip(tile, geom, True), rst_transform(tile, 4326), bng_pointascell(pt, 1))Matches Mosaic/DBR built-in ergonomics for booleans/numerics. Python wrappers auto-wrap bool/int/float/bytes via f.lit; Scala adds typed overloads. Strings still follow pyspark's column-ref convention — rx.rst_width("tile") is still f.col("tile"); wrap in f.lit(...) for string literals (e.g. driver=f.lit("GTiff")).

All specific function renames from that standardization are listed in the component tables below.


RasterX

BaselineCurrentNotes
(GDAL reader output column) pathsourceDocs/tests aligned to GDAL reader output column name.
rst_band_metadata / gbx_rst_band_metadatarst_bandmetadata / gbx_rst_bandmetadataUnderscore standardization.
rst_bounding_box / gbx_rst_bounding_boxrst_boundingbox / gbx_rst_boundingboxUnderscore standardization.
rst_pixel_width / gbx_rst_pixel_widthrst_pixelwidth / gbx_rst_pixelwidthUnderscore standardization.
rst_pixel_height / gbx_rst_pixel_heightrst_pixelheight / gbx_rst_pixelheightUnderscore standardization.
rst_num_bands / gbx_rst_num_bandsrst_numbands / gbx_rst_numbandsUnderscore standardization.
rst_pixel_count / gbx_rst_pixel_countrst_pixelcount / gbx_rst_pixelcountUnderscore standardization.
rst_scale_x / gbx_rst_scale_xrst_scalex / gbx_rst_scalexUnderscore standardization.
rst_scale_y / gbx_rst_scale_yrst_scaley / gbx_rst_scaleyUnderscore standardization.
rst_upper_left_x / gbx_rst_upper_left_xrst_upperleftx / gbx_rst_upperleftxUnderscore standardization.
rst_upper_left_y / gbx_rst_upper_left_yrst_upperlefty / gbx_rst_upperleftyUnderscore standardization.
rst_geo_reference / gbx_rst_geo_referencerst_georeference / gbx_rst_georeferenceUnderscore standardization.
rst_get_nodata / gbx_rst_get_nodatarst_getnodata / gbx_rst_getnodataUnderscore standardization.
rst_get_subdataset / gbx_rst_get_subdatasetrst_getsubdataset / gbx_rst_getsubdatasetUnderscore standardization.
rst_mem_size / gbx_rst_mem_sizerst_memsize / gbx_rst_memsizeUnderscore standardization.
rst_sub_datasets / gbx_rst_sub_datasetsrst_subdatasets / gbx_rst_subdatasetsUnderscore standardization.
rst_combine_avg_agg / gbx_rst_combine_avg_aggrst_combineavg_agg / gbx_rst_combineavg_aggUnderscore standardization.
rst_derived_band_agg / gbx_rst_derived_band_aggrst_derivedband_agg / gbx_rst_derivedband_aggUnderscore standardization.
rst_from_content / gbx_rst_from_contentrst_fromcontent / gbx_rst_fromcontentUnderscore standardization.
rst_from_file / gbx_rst_from_filerst_fromfile / gbx_rst_fromfileUnderscore standardization.
rst_from_bands / gbx_rst_from_bandsrst_frombands / gbx_rst_frombandsUnderscore standardization.
rst_make_tiles / gbx_rst_make_tilesrst_maketiles / gbx_rst_maketilesUnderscore standardization.
rst_re_tile / gbx_rst_re_tilerst_retile / gbx_rst_retileUnderscore standardization.
rst_separate_bands / gbx_rst_separate_bandsrst_separatebands / gbx_rst_separatebandsUnderscore standardization.
rst_to_overlapping_tiles / gbx_rst_to_overlapping_tilesrst_tooverlappingtiles / gbx_rst_tooverlappingtilesUnderscore standardization.
rst_init_nodata / gbx_rst_init_nodatarst_initnodata / gbx_rst_initnodataUnderscore standardization.
rst_is_empty / gbx_rst_is_emptyrst_isempty / gbx_rst_isemptyUnderscore standardization.
rst_map_algebra / gbx_rst_map_algebrarst_mapalgebra / gbx_rst_mapalgebraUnderscore standardization.
rst_raster_to_world_coord / gbx_rst_raster_to_world_coord (and X/Y variants)rst_rastertoworldcoord / gbx_rst_rastertoworldcoord (and X/Y)Underscore standardization.
rst_world_to_raster_coord / gbx_rst_world_to_raster_coord (and X/Y variants)rst_worldtorastercoord / gbx_rst_worldtorastercoord (and X/Y)Underscore standardization.
rst_as_format / gbx_rst_as_formatrst_asformat / gbx_rst_asformatUnderscore standardization.
rst_combine_avg / gbx_rst_combine_avgrst_combineavg / gbx_rst_combineavgUnderscore standardization.
rst_h3_raster_to_grid_avg (and Count/Max/Min/Median)rst_h3_rastertogridavg (and Count/Max/Min/Median)Underscore standardization.
rst_bandmetadata(tile) (single arg)rst_bandmetadata(tile, band)Required band parameter added; use e.g. rst_bandmetadata("tile", f.lit(1)).
rst_fromfile raster field was StringType (path) with metadata.size = -1rst_fromfile raster field is BinaryType (file bytes) with real metadata.sizerst_fromfile now reads the file into the tile, so tiles are self-contained and downstream ops (e.g. rst_clip) no longer produce orphan temp paths. Matches rst_fromcontent and the GDAL reader.
Default output compression was ZSTD (TIFF tag 50000)Default output compression is DEFLATE (baseline TIFF)ZSTD output was not decodable by Java ImageIO and broke the Databricks image preview after operators like rst_clip. DEFLATE is universally previewable and (with PREDICTOR=2/3) still compresses well. Override per-call via tile metadata compression key.

GridX (BNG)

BaselineCurrentNotes
bng_eastnortasbng (Python) / gbx_bng_eastnortasbng (SQL)bng_eastnorthasbng / gbx_bng_eastnorthasbngStandardize across languages (Python had typo; Scala already eastnorth).
bng_cell_area / gbx_bng_cell_areabng_cellarea / gbx_bng_cellareaUnderscore standardization.
bng_cell_intersection / gbx_bng_cell_intersectionbng_cellintersection / gbx_bng_cellintersectionUnderscore standardization.
bng_cell_union / gbx_bng_cell_unionbng_cellunion / gbx_bng_cellunionUnderscore standardization.
bng_euclidean_distance / gbx_bng_euclidean_distancebng_euclideandistance / gbx_bng_euclideandistanceUnderscore standardization.
bng_point_as_bng / gbx_bng_point_as_bngbng_pointascell / gbx_bng_pointascellUnderscore standardization; Renamed for clarity: point → cell (not "point as BNG").
bng_cell_intersection_agg / gbx_bng_cell_intersection_aggbng_cellintersection_agg / gbx_bng_cellintersection_aggUnderscore standardization.
bng_cell_union_agg / gbx_bng_cell_union_aggbng_cellunion_agg / gbx_bng_cellunion_aggUnderscore standardization.
bng_geometry_kring / gbx_bng_geometry_kringbng_geomkring / gbx_bng_geomkring_geometry → _geom in name.
bng_geometry_kloop / gbx_bng_geometry_kloopbng_geomkloop / gbx_bng_geomkloop_geometry → _geom in name.
bng_geometry_kring_explode / gbx_bng_geometry_kring_explodebng_geomkringexplode / gbx_bng_geomkringexplode_geometry → _geom + underscore standardization.
bng_geometry_kloop_explode / gbx_bng_geometry_kloop_explodebng_geomkloopexplode / gbx_bng_geomkloopexplode_geometry → _geom + underscore standardization.
bng_k_ring / gbx_bng_k_ringbng_kring / gbx_bng_kringUnderscore standardization.
bng_k_loop / gbx_bng_k_loopbng_kloop / gbx_bng_kloopUnderscore standardization.
bng_k_ring_explode / gbx_bng_k_ring_explodebng_kringexplode / gbx_bng_kringexplodeUnderscore standardization.
bng_k_loop_explode / gbx_bng_k_loop_explodebng_kloopexplode / gbx_bng_kloopexplodeUnderscore standardization.
bng_tessellate_explode / gbx_bng_tessellate_explodebng_tessellateexplode / gbx_bng_tessellateexplodeUnderscore standardization.

VectorX

BaselineCurrentNotes
(Schema/column) _geometry_geomStandardize geometry column suffix across readers and examples.
st_legacy_as_wkb / gbx_st_legacy_as_wkbst_legacyaswkb / gbx_st_legacyaswkbUnderscore standardization.

Readers

BaselineCurrentNotes
shapefileshapefile_ogrReader namespace: format + engine to avoid conflicts with other Spark extensions.
geojsongeojson_ogrSame.
ogr_gpkggpkg_ogrSame; consistent format_engine order.
file_gdbfile_gdb_ogrSame.
(none)gtiff_gdalNew reader: named GDAL reader for GeoTIFF; use instead of gdal with option("driver", "GTiff").
info

Reader renames above are planned for 0.2.0. Beta (0.1.x) may still expose the baseline names in some contexts.


Scalar values vs lit(...) wrapping

Previously, every non-Column argument had to be wrapped in f.lit(...) (Python) or lit(...) (Scala). That was a regression from Mosaic/DBR built-ins, where booleans and numerics can be passed as plain values. In 0.3.0, plain scalars are accepted across Python, Scala, and SQL bindings.

Python — wrappers accept Column or scalar (bool/int/float/bytes); non-string scalars are auto-wrapped with f.lit(...). Strings still follow pyspark's column-reference convention (bare string ≈ f.col(name)); wrap in f.lit("...") to pass a string literal.

# ✅ Before 0.3.0 — required f.lit for every value
rx.rst_clip("tile", "geom", f.lit(True))
rx.rst_transform("tile", f.lit(4326))
bx.bng_pointascell("pt", f.lit(1))
bx.bng_pointascell("pt", f.lit("1km"))

# ✅ 0.3.0 — scalars accepted directly
rx.rst_clip("tile", "geom", True)
rx.rst_transform("tile", 4326)
bx.bng_pointascell("pt", 1)
bx.bng_pointascell("pt", f.lit("1km")) # string literal — still wrap in f.lit

Scala — typed overloads added for Boolean / Int / Double / String value parameters. Column args (e.g. geometry, tile) still take Column.

// ✅ 0.3.0 — scalar overloads resolve without lit(...)
rst_clip(col("tile"), col("geom"), cutlineAllTouched = true)
rst_transform(col("tile"), 4326)
bng_pointascell(col("pt"), 1)
bng_pointascell(col("pt"), "1km")

SQL — values are already natively accepted by Spark SQL; no change needed:

SELECT gbx_rst_clip(tile, geom, true) FROM ...;
SELECT gbx_bng_pointascell(pt, 1) FROM ...;
SELECT gbx_bng_pointascell(pt, '1km') FROM ...;

When you still need f.lit(...) in Python:

  • String literals: rx.rst_fromfile(f.lit("/path/to.tif"), f.lit("GTiff")) — a bare string is treated as a column reference.
  • Nulls / explicit typing: e.g. f.lit(None).cast("double").

How to use this page

  • Migrating code: Search for the baseline name in your code or config; replace with Current and apply any behavior notes.
  • Docs or tests: After a change, add one row here so future readers know what changed and why.
  • After approval: Move content into formal release notes (e.g. per-version sections) and keep this page for historical beta-only changes, or retire it.

Notable improvements and fixes

  • Python package rename: Imports changed from geobrix.* to databricks.labs.gbx.* to align with Scala and the published artifact; update all import statements and environment references.
  • Init script / NumPy: Init script updated to install NumPy 2.x so GDAL Python array operations execute correctly; fixes runtime failures in gbx_rst_mapalgebra and gbx_rst_ndvi when used with array-based paths.
  • Error handling: Functions that previously threw exceptions during execution now surface errors more clearly (e.g. return null or a controlled default with error messages captured) instead of failing with opaque stack traces.
  • RasterX rst_bandmetadata: A required band argument was added; call as rst_bandmetadata(tile, band) (e.g. rst_bandmetadata("tile", f.lit(1))) in Python/SQL/Scala.
  • GDAL reader column: Raster DataFrames from the GDAL reader use the column name source (not path) for the file path; update any code or docs that assumed path.
  • BNG aggregators (bng_cellunion_agg, bng_cellintersection_agg): Fixed a bug where aggregation buffers were shared across partitions (and across tests in the same JVM), causing incorrect core flags when running full test suites or with multiple partitions. Each partition now gets a fresh buffer. Chip fields are resolved by type/name in the union aggregator for robustness to struct field order. Test expectation corrected for “all core chips” intersection: result is now correctly documented as core=true (whole cell).
  • rst_clip axis-order fix for EPSG-imported CRS (fixes all-black clips): When the clip geometry's CRS was set via an EPSG code (plain rst_transform-style input, EWKT SRID=4326;..., or EWKB with SRID), GDAL 3+ defaults that SpatialReference to authority-compliant axis order — for EPSG:4326 that means (latitude, longitude). JTS / Databricks / most GIS tooling emit WKT/WKB coordinates in traditional (x, y) = (lon, lat) order, so the reprojection inside rst_clip was silently swapping the axes (e.g. -80 14 interpreted as lat=-80, near the south pole) and the cutline missed the raster entirely, producing all-black output. OSRTransformGeometry.transform now clones both source and destination SpatialReferences and forces OAMS_TRADITIONAL_GIS_ORDER on the clones before running the OGR transform, so JTS-origin WKB is interpreted correctly. Caller-owned SpatialReferences are not mutated.
  • EWKT / EWKB support for raster clip (CRS mismatch handling): rst_clip now accepts EWKT (SRID=<epsg>;<WKT>) and EWKB (PostGIS extended WKB) in addition to plain WKT/WKB. Semantics:
    • Plain WKT / WKB (no SRID): the geometry is assumed to already be in the raster's CRS; no reprojection is performed.
    • EWKT / EWKB (SRID set and resolvable via EPSG): the geometry's CRS is used and, if it differs from the raster's CRS, the cutline is reprojected before clipping.
    • If the SRID is 0 or not a valid EPSG code, the code falls back to the raster's CRS (same as the plain case) — this restores Mosaic-compatible behavior but no longer silently produces an empty/black clip when a caller forgets to set the SRID. JTS.fromWKT / JTS.fromWKB now auto-detect EWKT/EWKB; new JTS.toEWKT / JTS.toEWKB helpers emit SRID-preserving forms. Plain toWKT / toWKB output is unchanged (OGC, no SRID).
  • rst_transform invalid SRID: rst_transform(tile, targetSrid) now rejects targetSrid <= 0 and EPSG codes that GDAL cannot resolve with a clear error (surfaced in tile metadata error_message) instead of returning a raster with an uninitialized CRS.
  • /vsimem/ path handling hardening: rst_memsize / rst_unlink and the GDAL writer's in-memory byte fetch now use startsWith("/vsimem/") (not contains) and null-check GetMemFileBuffer, so datasets whose description happens to embed the substring (e.g. NetCDF subdataset selectors) are no longer mis-routed through the in-memory branch.