Quick Start
This guide will help you get started with GeoBrix quickly after installation. All code below is copy-paste executable when you use the sample data paths shown.
Set up sample data first
The examples on this page use sample data paths (e.g. /Volumes/.../geobrix-examples/nyc/...). To run them as-is, set up sample data first: Sample Data Guide — then use the same paths or volume in your notebook.
Prerequisites
- Install GeoBrix on your Databricks cluster.
- (Recommended) Set up sample data so the snippets below run without changing paths.
note
You do not need to register functions if you are only using the included readers.
Register Functions
Python/PySpark
# Register RasterX functions (required for gbx_rst_* in SQL)
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
Scala
Register RasterX (Scala)
import com.databricks.labs.gbx.rasterx.{functions => rx}
rx.register(spark)
Register GridX and VectorX the same way:
# Register GridX BNG functions (required for gbx_bng_* in SQL)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
# Register VectorX functions (required for gbx_st_* in SQL)
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
vx.register(spark)
List and Describe Functions
-- List GeoBrix functions
SHOW FUNCTIONS LIKE 'gbx_rst_*';
SHOW FUNCTIONS LIKE 'gbx_bng_*';
SHOW FUNCTIONS LIKE 'gbx_st_*';
Example output
+--------------------+
|function |
+--------------------+
|gbx_rst_asformat |
|gbx_rst_avg |
|gbx_rst_bandmetadata|
...
-- Describe a function
DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox;
Example output
-DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox
Function: gbx_rst_boundingbox
Type: ...
Reading Data
Paths below assume the Sample Data layout (e.g. Essential Bundle).
Read GeoTIFF
# Read GeoTIFF rasters (GDAL reader)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.limit(3).show()
Example output
+--------------------+----+-----+------+
|source |bbox|width|height|
+--------------------+----+-----+------+
|.../nyc/sentinel2/..|... |10980|10980 |
+--------------------+----+-----+------+
Read Shapefile
# Read shapefile (supports .zip)
shapes = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
shapes.limit(3).show()
Example output
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+
Read GeoJSON
# Read GeoJSON
geojson_df = spark.read.format("geojson_ogr").option("multi", "false").load(
"/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/taxi-zones/nyc_taxi_zones.geojson"
)
geojson_df.limit(3).show()
Example output
+----------+--------+-----+
|path |geom_0 |... |
+----------+--------+-----+
|... |[BINARY]|... |
+----------+--------+-----+
Using RasterX, GridX, and VectorX
RasterX
# Register, load raster, apply RasterX
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.select(rx.rst_boundingbox("tile").alias("bbox"), rx.rst_width("tile"), rx.rst_height("tile")).limit(3).show()
Example output
+--------------------+-----+------+
|bbox |width|height|
+--------------------+-----+------+
|POLYGON ((...)) |10980|10980 |
+--------------------+-----+------+
GridX (BNG)
# Register GridX BNG, then use in SQL (gbx_bng_cellarea returns square kilometres)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
spark.sql("SELECT gbx_bng_cellarea('TQ3080') as area_km2").show()
Example output
+----------+
|area_km2 |
+----------+
|1.0 |
+----------+
VectorX
# Register VectorX, create legacy point struct, convert to WKB
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType, IntegerType, StructField, StructType
vx.register(spark)
legacy_schema = StructType([
StructField("typeId", IntegerType()),
StructField("srid", IntegerType()),
StructField("boundaries", ArrayType(ArrayType(ArrayType(DoubleType())))),
StructField("holes", ArrayType(ArrayType(ArrayType(ArrayType(DoubleType()))))),
])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], StructType([StructField("geom_legacy", legacy_schema)]))
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()
Example output
+-----------+
|wkb |
+-----------+
|[BINARY] |
+-----------+
Using SQL
-- Read shapefile and query in SQL
CREATE OR REPLACE TEMP VIEW my_shapes AS
SELECT * FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;
SELECT * FROM my_shapes LIMIT 3;
Example output
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+
Next Steps
- Sample Data — Download and use sample data in examples
- Readers — Shapefile, GeoJSON, GDAL, and more
- API Reference — Function reference
- RasterX · GridX · VectorX