Skip to main content

Quick Start

This guide will help you get started with GeoBrix quickly after installation. All code below is copy-paste executable when you use the sample data paths shown.

Set up sample data first

The examples on this page use sample data paths (e.g. /Volumes/.../geobrix-examples/nyc/...). To run them as-is, set up sample data first: Sample Data Guide — then use the same paths or volume in your notebook.

Prerequisites

note

You do not need to register functions if you are only using the included readers.

Register Functions

Python/PySpark

# Register RasterX functions (required for gbx_rst_* in SQL)
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)

Scala

Register RasterX (Scala)
import com.databricks.labs.gbx.rasterx.{functions => rx}
rx.register(spark)

Register GridX and VectorX the same way:

# Register GridX BNG functions (required for gbx_bng_* in SQL)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
# Register VectorX functions (required for gbx_st_* in SQL)
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
vx.register(spark)

List and Describe Functions

-- List GeoBrix functions
SHOW FUNCTIONS LIKE 'gbx_rst_*';
SHOW FUNCTIONS LIKE 'gbx_bng_*';
SHOW FUNCTIONS LIKE 'gbx_st_*';
Example output
+--------------------+
|function |
+--------------------+
|gbx_rst_asformat |
|gbx_rst_avg |
|gbx_rst_bandmetadata|
...
-- Describe a function
DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox;
Example output
-DESCRIBE FUNCTION EXTENDED gbx_rst_boundingbox
Function: gbx_rst_boundingbox
Type: ...

Reading Data

Paths below assume the Sample Data layout (e.g. Essential Bundle).

Read GeoTIFF

# Read GeoTIFF rasters (GDAL reader)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.limit(3).show()
Example output
+--------------------+----+-----+------+
|source |bbox|width|height|
+--------------------+----+-----+------+
|.../nyc/sentinel2/..|... |10980|10980 |
+--------------------+----+-----+------+

Read Shapefile

# Read shapefile (supports .zip)
shapes = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
shapes.limit(3).show()
Example output
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+

Read GeoJSON

# Read GeoJSON
geojson_df = spark.read.format("geojson_ogr").option("multi", "false").load(
"/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/taxi-zones/nyc_taxi_zones.geojson"
)
geojson_df.limit(3).show()
Example output
+----------+--------+-----+
|path |geom_0 |... |
+----------+--------+-----+
|... |[BINARY]|... |
+----------+--------+-----+

Using RasterX, GridX, and VectorX

RasterX

# Register, load raster, apply RasterX
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
rasters = spark.read.format("gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
rasters.select(rx.rst_boundingbox("tile").alias("bbox"), rx.rst_width("tile"), rx.rst_height("tile")).limit(3).show()
Example output
+--------------------+-----+------+
|bbox |width|height|
+--------------------+-----+------+
|POLYGON ((...)) |10980|10980 |
+--------------------+-----+------+

GridX (BNG)

# Register GridX BNG, then use in SQL (gbx_bng_cellarea returns square kilometres)
from databricks.labs.gbx.gridx.bng import functions as bx
bx.register(spark)
spark.sql("SELECT gbx_bng_cellarea('TQ3080') as area_km2").show()
Example output
+----------+
|area_km2 |
+----------+
|1.0 |
+----------+

VectorX

# Register VectorX, create legacy point struct, convert to WKB
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType, IntegerType, StructField, StructType
vx.register(spark)
legacy_schema = StructType([
StructField("typeId", IntegerType()),
StructField("srid", IntegerType()),
StructField("boundaries", ArrayType(ArrayType(ArrayType(DoubleType())))),
StructField("holes", ArrayType(ArrayType(ArrayType(ArrayType(DoubleType()))))),
])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], StructType([StructField("geom_legacy", legacy_schema)]))
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()
Example output
+-----------+
|wkb |
+-----------+
|[BINARY] |
+-----------+

Using SQL

-- Read shapefile and query in SQL
CREATE OR REPLACE TEMP VIEW my_shapes AS
SELECT * FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;

SELECT * FROM my_shapes LIMIT 3;
Example output
+----+--------+-----+
|path|geom_0 |... |
+----+--------+-----+
|... |[BINARY]|... |
+----+--------+-----+

Next Steps