Quick Start

This guide will help you get started with GeoBrix quickly after installation. All code below is copy-paste executable when you use the sample data paths shown.

Set up sample data first

The examples on this page use sample data paths (e.g. /Volumes/.../geobrix-examples/nyc/...). To run them as-is, set up sample data first: Sample Data Guide — then use the same paths or volume in your notebook.

Prerequisites

Install GeoBrix on your Databricks cluster.
(Recommended) Set up sample data so the snippets below run without changing paths.

note

You do not need to register functions if you are only using the included readers.

Register Functions

Python/PySpark

# Register RasterX functions (required for gbx_rst_* in SQL)
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)

Scala

Register RasterX (Scala)
import com.databricks.labs.gbx.rasterx.{functions => rx}
rx.register(spark)

# Register GridX BNG functions (required for gbx_bng_* in SQL)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)

# Register VectorX functions (required for gbx_st_* in SQL)
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
vx.register(spark)

List and Describe Functions

-- List GeoBrix functions
SHOW FUNCTIONS LIKE 'gbx_rst_*';
SHOW FUNCTIONS LIKE 'gbx_bng_*';
SHOW FUNCTIONS LIKE 'gbx_st_*';

Example output
+--------------------+
|function            |
+--------------------+
|gbx_rst_asformat    |
|gbx_rst_avg         |
|gbx_rst_bandmetadata|
...

-- Describe a function
DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox;

Example output
-DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox
Function: gbx_rst_boundingbox
Type: ...

Reading Data

Paths below assume the Sample Data layout (e.g. Essential Bundle).

Read GeoTIFF

# Read GeoTIFF rasters (GDAL reader)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.limit(3).show()

Example output
+--------------------+----+-----+------+
|source              |bbox|width|height|
+--------------------+----+-----+------+
|.../nyc/sentinel2/..|... |10980|10980 |
+--------------------+----+-----+------+

Read Shapefile

# Read shapefile (supports .zip)
shapes = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
shapes.limit(3).show()

Example output
+----+--------+-----+
|path|geom_0  |...  |
+----+--------+-----+
|... |[BINARY]|...  |
+----+--------+-----+

Read GeoJSON

# Read GeoJSON
geojson_df = spark.read.format("geojson_ogr").option("multi", "false").load(
    "/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/taxi-zones/nyc_taxi_zones.geojson"
)
geojson_df.limit(3).show()

Example output
+----------+--------+-----+
|path      |geom_0  |...  |
+----------+--------+-----+
|...       |[BINARY]|...  |
+----------+--------+-----+

Using RasterX, GridX, and VectorX

RasterX

# Register, load raster, apply RasterX
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.select(rx.rst_boundingbox("tile").alias("bbox"), rx.rst_width("tile"), rx.rst_height("tile")).limit(3).show()

Example output
+--------------------+-----+------+
|bbox                |width|height|
+--------------------+-----+------+
|POLYGON ((...))     |10980|10980 |
+--------------------+-----+------+

GridX (BNG)

# Register GridX BNG, then use in SQL (gbx_bng_cellarea returns square kilometres)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
spark.sql("SELECT gbx_bng_cellarea('TQ3080') as area_km2").show()

Example output
+----------+
|area_km2  |
+----------+
|1.0       |
+----------+

VectorX

# Register VectorX, create legacy point struct, convert to WKB
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType, IntegerType, StructField, StructType
vx.register(spark)
legacy_schema = StructType([
    StructField("typeId", IntegerType()),
    StructField("srid", IntegerType()),
    StructField("boundaries", ArrayType(ArrayType(ArrayType(DoubleType())))),
    StructField("holes", ArrayType(ArrayType(ArrayType(ArrayType(DoubleType()))))),
])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], StructType([StructField("geom_legacy", legacy_schema)]))
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()

Example output
+-----------+
|wkb        |
+-----------+
|[BINARY]   |
+-----------+

Using SQL

-- Read shapefile and query in SQL
CREATE OR REPLACE TEMP VIEW my_shapes AS
SELECT * FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;

SELECT * FROM my_shapes LIMIT 3;

Example output
+----+--------+-----+
|path|geom_0  |...  |
+----+--------+-----+
|... |[BINARY]|...  |
+----+--------+-----+

Next Steps

Sample Data — Download and use sample data in examples
Readers — Shapefile, GeoJSON, GDAL, and more
API Reference — Function reference
RasterX · GridX · VectorX

Prerequisites​

Register Functions​

Python/PySpark​

Scala​

List and Describe Functions​

Reading Data​

Read GeoTIFF​

Read Shapefile​

Read GeoJSON​

Using RasterX, GridX, and VectorX​

RasterX​

GridX (BNG)​

VectorX​

Using SQL​

Next Steps​