Quick Start
This guide will help you get started with GeoBrix quickly after installation. All code below is copy-paste executable when you use the sample data paths shown.
Choose your execution tier
GeoBrix ships in two tiers that share the same function names — pick one with the toggle below and the rest of this page follows your choice (all tabs grouped under the same tier toggle stay in sync).
- Lightweight — recommended
- Heavyweight
Pure-Python (rasterio / pyogrio / NumPy), no JAR and no GDAL or OGR to install. A single wheel via %pip or as a cluster library, and it runs on serverless, standard/shared, ARM, and Lakeflow declarative pipelines. Covers RasterX (every rst_* function), VectorX (gbx_st_*), and the GridX quadbin grid (gbx_quadbin_*), plus the lightweight readers.
# Stage the wheel in a Unity Catalog Volume, then install the [light] extra.
# Use the quoted PEP 508 "name[extra] @ file://" form (Serverless-safe):
%pip install "geobrix[light] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-<version>-py3-none-any.whl"
from databricks.labs.gbx.pyrx import functions as rx
rx.register(spark) # installs the gbx_rst_* SQL names, pyspark-backed
geobrix[light] @ file://… requirementInstall with the named, quoted form above. Do not put the extra on the
path ('/Volumes/…/…whl[light]'): on Serverless, %pip writes the
requirement to a file including the quotes, so pip reads [light] as part of
the filename and fails with "Expected package name at the start of dependency
specifier." The named form installs cleanly on Serverless, standard/shared,
and ARM.
JVM-native (Scala + GDAL/OGR), installed via the JAR + init script. Covers everything the lightweight tier does plus the heavy-only deltas: the GDAL/OGR readers (gtiff_gdal, gdal, shapefile_ogr, geojson_ogr, gpkg_ogr, file_gdb_ogr). Choose this tier when your environment is already JVM/GDAL-based or when you need one of those deltas.
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
The two tiers expose the same module alias (rx / vx / gx) and the same gbx_* SQL names, so most code on this page is identical across tiers — only the import/register line and the reader format strings differ. The tabs below sync to your choice; where a block has no tabs, the code is identical on both tiers.
See Installation for the two install paths and Choosing an Execution Tier for the full comparison.
The examples on this page use sample data paths (e.g. /Volumes/.../geobrix-examples/nyc/...). To run them as-is, set up sample data first: Sample Data Guide — then use the same paths or volume in your notebook.
Prerequisites
- Install GeoBrix on your Databricks cluster.
- (Recommended) Set up sample data so the snippets below run without changing paths.
You do not need to register functions if you are only using the included readers.
Register Functions
The import/register line is the one place each package genuinely differs by tier. Pick your tier; the function names you call afterward are the same either way.
- Lightweight
- Heavyweight
# Register RasterX (lightweight pyrx) — gbx_rst_* SQL names, pyspark-backed
from databricks.labs.gbx.pyrx import functions as rx
rx.register(spark)
Register VectorX and the GridX quadbin grid the same way:
# Register VectorX (lightweight pyvx) — gbx_st_* SQL names
from databricks.labs.gbx.pyvx import functions as vx
vx.register(spark)
# Register GridX quadbin (lightweight pygx) — gbx_quadbin_* SQL names
from databricks.labs.gbx.pygx import functions as gx
gx.register(spark)
# Register RasterX functions (required for gbx_rst_* in SQL)
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
The same in Scala:
import com.databricks.labs.gbx.rasterx.{functions => rx}
rx.register(spark)
Register GridX (BNG) and VectorX the same way:
# Register GridX BNG functions (required for gbx_bng_* in SQL)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
# Register VectorX functions (required for gbx_st_* in SQL)
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
vx.register(spark)
List and Describe Functions
-- List GeoBrix functions
SHOW FUNCTIONS LIKE 'gbx_rst_*';
SHOW FUNCTIONS LIKE 'gbx_bng_*';
SHOW FUNCTIONS LIKE 'gbx_st_*';
+--------------------+
|function |
+--------------------+
|gbx_rst_asformat |
|gbx_rst_avg |
|gbx_rst_bandmetadata|
...
-- Describe a function
DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox;
-DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox
Function: gbx_rst_boundingbox
Type: ...
Reading Data
Paths below assume the Sample Data layout (e.g. Essential Bundle). The reader format string is the part that differs by tier: lightweight uses the pure-Python *_gbx readers (no GDAL/OGR install); heavyweight uses the GDAL/OGR readers.
- Lightweight
- Heavyweight
Register the lightweight readers once per session, then load by format string (gtiff_gbx / raster_gbx for rasters; shapefile_gbx / geojson_gbx / gpkg_gbx / file_gdb_gbx for vectors).
Read GeoTIFF
# Read GeoTIFF rasters (lightweight gtiff_gbx reader — pure-Python, no GDAL install)
from databricks.labs.gbx.ds import register as gbx_readers
gbx_readers.register(spark) # register the lightweight readers once per session
rasters = spark.read.format("gtiff_gbx").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.limit(3).show()
+--------------------+----+-----+------+
|source |bbox|width|height|
+--------------------+----+-----+------+
|.../nyc/sentinel2/..|... |10980|10980 |
+--------------------+----+-----+------+
Read Shapefile
# Read shapefile (lightweight shapefile_gbx reader — pyogrio-backed, no OGR install)
from databricks.labs.gbx.ds import register as gbx_readers
gbx_readers.register(spark) # register the lightweight readers once per session
shapes = spark.read.format("shapefile_gbx").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
shapes.limit(3).show()
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+
The heavyweight tier reads through the GDAL/OGR readers — gdal / gtiff_gdal for rasters; shapefile_ogr / geojson_ogr / gpkg_ogr / file_gdb_ogr for vectors.
Read GeoTIFF
# Read GeoTIFF rasters (GDAL reader)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.limit(3).show()
+--------------------+----+-----+------+
|source |bbox|width|height|
+--------------------+----+-----+------+
|.../nyc/sentinel2/..|... |10980|10980 |
+--------------------+----+-----+------+
Read Shapefile
# Read shapefile (supports .zip)
shapes = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
shapes.limit(3).show()
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+
Read GeoJSON
# Read GeoJSON
geojson_df = spark.read.format("geojson_ogr").option("multi", "false").load(
"/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/taxi-zones/nyc_taxi_zones.geojson"
)
geojson_df.limit(3).show()
+----------+--------+-----+
|path |geom_0 |... |
+----------+--------+-----+
|... |[BINARY]|... |
+----------+--------+-----+
Using RasterX, GridX, and VectorX
RasterX
Identical on both tiers — only the import/register differs (see the tier toggle above). Once registered, the rx.rst_* calls are the same.
# Register, load raster, apply RasterX
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.select(rx.rst_boundingbox("tile").alias("bbox"), rx.rst_width("tile"), rx.rst_height("tile")).limit(3).show()
+--------------------+-----+------+
|bbox |width|height|
+--------------------+-----+------+
|POLYGON ((...)) |10980|10980 |
+--------------------+-----+------+
GridX
All GridX grids — quadbin (gx.quadbin_* / gbx_quadbin_*), BNG (gbx_bng_*), and custom grids (gbx_custom_*) — are available on both tiers.
- Lightweight
- Heavyweight
Index a point to a quadbin cell and read its resolution:
# Register GridX quadbin (works on BOTH tiers), index a point, inspect the cell
from databricks.labs.gbx.pygx import functions as gx
from pyspark.sql import Row
from pyspark.sql.functions import lit
gx.register(spark)
# A point in NYC (WGS84 lon/lat) -> a resolution-15 quadbin cell, then its resolution
points = spark.createDataFrame([Row(lon=-73.985, lat=40.748)])
points.select(
gx.quadbin_pointascell("lon", "lat", lit(15)).alias("cell")
).select(
"cell",
gx.quadbin_resolution("cell").alias("res"),
).show()
+-------------------+---+
|cell |res|
+-------------------+---+
|5256690677307146239|15 |
+-------------------+---+
The heavyweight tier adds the BNG grid (gbx_bng_*) and custom grids on top of quadbin:
# Register GridX BNG, then use in SQL (gbx_bng_cellarea returns square kilometres)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
spark.sql("SELECT gbx_bng_cellarea('TQ3080') as area_km2").show()
+----------+
|area_km2 |
+----------+
|1.0 |
+----------+
VectorX
Identical on both tiers — only the import/register differs (see the tier toggle above). The vx.st_* calls and gbx_st_* SQL names are the same.
# Register VectorX, create legacy point struct, convert to WKB
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType, IntegerType, StructField, StructType
vx.register(spark)
legacy_schema = StructType([
StructField("typeId", IntegerType()),
StructField("srid", IntegerType()),
StructField("boundaries", ArrayType(ArrayType(ArrayType(DoubleType())))),
StructField("holes", ArrayType(ArrayType(ArrayType(ArrayType(DoubleType()))))),
])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], StructType([StructField("geom_legacy", legacy_schema)]))
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()
+-----------+
|wkb |
+-----------+
|[BINARY] |
+-----------+
Using SQL
The gbx_* SQL names are identical on both tiers (only the reader format string differs — see Reading Data). This example uses a heavyweight reader; swap shapefile_ogr for shapefile_gbx on the lightweight tier.
-- Read shapefile and query in SQL
CREATE OR REPLACE TEMP VIEW my_shapes AS
SELECT * FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;
SELECT * FROM my_shapes LIMIT 3;
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+
Next Steps
- Sample Data — Download and use sample data in examples
- Readers — Shapefile, GeoJSON, GDAL, and more
- API Reference — Function reference
- Raster Functions · GridX · VectorX