Skip to main content

Quick Start

This guide will help you get started with GeoBrix quickly after installation. All code below is copy-paste executable when you use the sample data paths shown.

Choose your execution tier

GeoBrix ships in two tiers that share the same function names — pick one with the toggle below and the rest of this page follows your choice (all tabs grouped under the same tier toggle stay in sync).

Pure-Python (rasterio / pyogrio / NumPy), no JAR and no GDAL or OGR to install. A single wheel via %pip or as a cluster library, and it runs on serverless, standard/shared, ARM, and Lakeflow declarative pipelines. Covers RasterX (every rst_* function), VectorX (gbx_st_*), and the GridX quadbin grid (gbx_quadbin_*), plus the lightweight readers.

# Stage the wheel in a Unity Catalog Volume, then install the [light] extra.
# Use the quoted PEP 508 "name[extra] @ file://" form (Serverless-safe):
%pip install "geobrix[light] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-<version>-py3-none-any.whl"

from databricks.labs.gbx.pyrx import functions as rx
rx.register(spark) # installs the gbx_rst_* SQL names, pyspark-backed
Quote the geobrix[light] @ file://… requirement

Install with the named, quoted form above. Do not put the extra on the path ('/Volumes/…/…whl[light]'): on Serverless, %pip writes the requirement to a file including the quotes, so pip reads [light] as part of the filename and fails with "Expected package name at the start of dependency specifier." The named form installs cleanly on Serverless, standard/shared, and ARM.

The two tiers expose the same module alias (rx / vx / gx) and the same gbx_* SQL names, so most code on this page is identical across tiers — only the import/register line and the reader format strings differ. The tabs below sync to your choice; where a block has no tabs, the code is identical on both tiers.

See Installation for the two install paths and Choosing an Execution Tier for the full comparison.

Set up sample data first

The examples on this page use sample data paths (e.g. /Volumes/.../geobrix-examples/nyc/...). To run them as-is, set up sample data first: Sample Data Guide — then use the same paths or volume in your notebook.

Prerequisites

note

You do not need to register functions if you are only using the included readers.

Register Functions

The import/register line is the one place each package genuinely differs by tier. Pick your tier; the function names you call afterward are the same either way.

# Register RasterX (lightweight pyrx) — gbx_rst_* SQL names, pyspark-backed
from databricks.labs.gbx.pyrx import functions as rx
rx.register(spark)

Register VectorX and the GridX quadbin grid the same way:

# Register VectorX (lightweight pyvx) — gbx_st_* SQL names
from databricks.labs.gbx.pyvx import functions as vx
vx.register(spark)
# Register GridX quadbin (lightweight pygx) — gbx_quadbin_* SQL names
from databricks.labs.gbx.pygx import functions as gx
gx.register(spark)

List and Describe Functions

-- List GeoBrix functions
SHOW FUNCTIONS LIKE 'gbx_rst_*';
SHOW FUNCTIONS LIKE 'gbx_bng_*';
SHOW FUNCTIONS LIKE 'gbx_st_*';
Example output
+--------------------+
|function |
+--------------------+
|gbx_rst_asformat |
|gbx_rst_avg |
|gbx_rst_bandmetadata|
...
-- Describe a function
DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox;
Example output
-DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox
Function: gbx_rst_boundingbox
Type: ...

Reading Data

Paths below assume the Sample Data layout (e.g. Essential Bundle). The reader format string is the part that differs by tier: lightweight uses the pure-Python *_gbx readers (no GDAL/OGR install); heavyweight uses the GDAL/OGR readers.

Register the lightweight readers once per session, then load by format string (gtiff_gbx / raster_gbx for rasters; shapefile_gbx / geojson_gbx / gpkg_gbx / file_gdb_gbx for vectors).

Read GeoTIFF

# Read GeoTIFF rasters (lightweight gtiff_gbx reader — pure-Python, no GDAL install)
from databricks.labs.gbx.ds import register as gbx_readers
gbx_readers.register(spark) # register the lightweight readers once per session
rasters = spark.read.format("gtiff_gbx").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.limit(3).show()
Example output
+--------------------+----+-----+------+
|source |bbox|width|height|
+--------------------+----+-----+------+
|.../nyc/sentinel2/..|... |10980|10980 |
+--------------------+----+-----+------+

Read Shapefile

# Read shapefile (lightweight shapefile_gbx reader — pyogrio-backed, no OGR install)
from databricks.labs.gbx.ds import register as gbx_readers
gbx_readers.register(spark) # register the lightweight readers once per session
shapes = spark.read.format("shapefile_gbx").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
shapes.limit(3).show()
Example output
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+

Using RasterX, GridX, and VectorX

RasterX

Identical on both tiers — only the import/register differs (see the tier toggle above). Once registered, the rx.rst_* calls are the same.

# Register, load raster, apply RasterX
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.select(rx.rst_boundingbox("tile").alias("bbox"), rx.rst_width("tile"), rx.rst_height("tile")).limit(3).show()
Example output
+--------------------+-----+------+
|bbox |width|height|
+--------------------+-----+------+
|POLYGON ((...)) |10980|10980 |
+--------------------+-----+------+

GridX

All GridX grids — quadbin (gx.quadbin_* / gbx_quadbin_*), BNG (gbx_bng_*), and custom grids (gbx_custom_*) — are available on both tiers.

Index a point to a quadbin cell and read its resolution:

# Register GridX quadbin (works on BOTH tiers), index a point, inspect the cell
from databricks.labs.gbx.pygx import functions as gx
from pyspark.sql import Row
from pyspark.sql.functions import lit
gx.register(spark)
# A point in NYC (WGS84 lon/lat) -> a resolution-15 quadbin cell, then its resolution
points = spark.createDataFrame([Row(lon=-73.985, lat=40.748)])
points.select(
gx.quadbin_pointascell("lon", "lat", lit(15)).alias("cell")
).select(
"cell",
gx.quadbin_resolution("cell").alias("res"),
).show()
Example output
+-------------------+---+
|cell |res|
+-------------------+---+
|5256690677307146239|15 |
+-------------------+---+

VectorX

Identical on both tiers — only the import/register differs (see the tier toggle above). The vx.st_* calls and gbx_st_* SQL names are the same.

# Register VectorX, create legacy point struct, convert to WKB
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType, IntegerType, StructField, StructType
vx.register(spark)
legacy_schema = StructType([
StructField("typeId", IntegerType()),
StructField("srid", IntegerType()),
StructField("boundaries", ArrayType(ArrayType(ArrayType(DoubleType())))),
StructField("holes", ArrayType(ArrayType(ArrayType(ArrayType(DoubleType()))))),
])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], StructType([StructField("geom_legacy", legacy_schema)]))
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()
Example output
+-----------+
|wkb |
+-----------+
|[BINARY] |
+-----------+

Using SQL

The gbx_* SQL names are identical on both tiers (only the reader format string differs — see Reading Data). This example uses a heavyweight reader; swap shapefile_ogr for shapefile_gbx on the lightweight tier.

-- Read shapefile and query in SQL
CREATE OR REPLACE TEMP VIEW my_shapes AS
SELECT * FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;

SELECT * FROM my_shapes LIMIT 3;
Example output
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+

Next Steps