Advanced Usage Overview
This section covers advanced GeoBrix usage patterns for users who need to go beyond the standard out-of-the-box functionality.
What's Coveredβ
π§ Custom Spark UDFsβ
Learn how to leverage GeoBrix's execute methods to build custom Spark User-Defined Functions (UDFs) that directly interact with GDAL operations.
π₯οΈ GDAL Command Line Integrationβ
Understand how to complement GeoBrix with GDAL command-line utilities for preprocessing, format conversion, and specialized operations.
π¦ Third-Party Library Integrationβ
Integrate GeoBrix with popular Python geospatial libraries like rasterio, xarray, and PDAL for extended functionality.
Understanding GeoBrix Architectureβ
GeoBrix is built on top of GDAL and provides two primary interfaces:
Spark Expressions (Standard Usage)β
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
# Uses Spark's columnar expression engine
rasters = spark.read.format("gdal").load(_RASTER_PATH)
df = rasters.select(rx.rst_boundingbox("tile").alias("bbox"))
df.limit(3).show(truncate=40)
Example output
+--------------------+
|bbox |
+--------------------+
|POLYGON ((...)) |
|POLYGON ((...)) |
|... |
+--------------------+
Characteristics:
- β Optimized for Spark's catalyst optimizer
- β Automatic parallelization
- β Column-oriented operations
- β Best for standard workflows
Execute Methods (Advanced Usage)β
Execute Methods Example
import com.databricks.labs.gbx.rasterx.expressions.accessors.RST_BoundingBox
import org.gdal.gdal.Dataset
// Direct GDAL dataset manipulation
val bbox = RST_BoundingBox.execute(dataset)
Characteristics:
- β Direct GDAL access
- β Fine-grained control
- β Custom UDF building blocks
- β Best for specialized operations
When to Use Advanced Patternsβ
Use Standard GeoBrix When:β
- β Operations fit your use case
- β Working with DataFrames
- β Need automatic parallelization
- β Want Spark optimization
Use Advanced Patterns When:β
- π§ Need custom business logic
- π§ Require specialized GDAL operations
- π§ Integrating with external tools
- π§ Building custom functions
- π§ Need preprocessing/postprocessing
- π§ Working with formats not fully supported
Architecture Diagramβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Application β
β ββββββββββββββββββ ββββββββββββββββββ βββββββββββββ β
β β Standard API β β Custom UDFs β β GDAL CLI β β
β ββββββββββ¬ββββββββ βββββββββ¬βββββββββ βββββββ¬ββββββ β
βββββββββββββΌβββββββββββββββββββΌββββββββββββββββββΌβββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GeoBrix Layer β
β βββββββββββββββββββββ ββββββββββββββββββββββββ β
β β Spark Expressions β β Execute Methods β β
β β (eval functions) β β (direct GDAL calls) β β
β ββββββββββ¬βββββββββββ βββββββββββββ¬βββββββββββ β
βββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββ
β GDAL/OGR Layer β
β ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββ β
β β Drivers β β Dataset β β Band β β Geometryβ β
β ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Typical Advanced Workflowβ
- Read Data - Use GeoBrix readers or GDAL CLI for format conversion
- Preprocess - Apply GDAL utilities for reprojection, tiling, etc.
- Custom Processing - Build UDFs using execute methods for specialized logic
- Standard Processing - Use GeoBrix Spark expressions for distributed operations
- Integration - Leverage third-party libraries for specialized analysis
- Post-process - Use GDAL CLI or libraries for final formatting
Example: End-to-End Advanced Pipelineβ
# 1. Preprocess with GDAL CLI (via subprocess or notebook magic)
# !gdalwarp -t_srs EPSG:4326 input.tif reprojected.tif
# 2. Read with GeoBrix
rasters = spark.read.format("gdal").load("/data/reprojected.tif")
# 3. Apply custom UDF for specialized logic
from databricks.labs.gbx.rasterx.expressions.accessors import RST_Metadata
@udf(MapType(StringType(), StringType()))
def extract_custom_metadata(tile_binary):
# Custom logic using execute methods
# (This is simplified - see Custom UDFs guide for details)
from datetime import datetime
dataset = None # load_dataset_from_binary(tile_binary)
# metadata = RST_Metadata.execute(dataset)
# Add custom processing
metadata = {}
metadata["processed_date"] = datetime.now().isoformat()
return metadata
enriched = rasters.withColumn("custom_metadata", extract_custom_metadata("tile"))
# 4. Use standard GeoBrix for distributed operations
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
aoi_geometry = None # Placeholder
result = enriched.select(
"*",
rx.rst_boundingbox("tile").alias("bbox"),
rx.rst_clip("tile", aoi_geometry, lit(True)).alias("clipped")
)
# 5. Integrate with xarray for analysis (see Library Integration guide)
# Convert to xarray for advanced array operations
# ...
# 6. Save results (optional: result.limit(3).show() to inspect)
result.limit(3).show(truncate=30)
Example output
+----+--------------------+-------+-------+
|path|bbox |... |clipped|
+----+--------------------+-------+-------+
|... |POLYGON ((...)) |... |... |
+----+--------------------+-------+-------+
Next Stepsβ
Explore each advanced topic in detail:
- Custom UDFs - Start here to understand execute methods
- GDAL CLI - Learn preprocessing and postprocessing
- Library Integration - Connect with rasterio, xarray, PDAL
Prerequisitesβ
For advanced usage, you should be familiar with:
- β Basic GeoBrix usage (see Quick Start)
- β Spark DataFrames and UDFs
- β Python and/or Scala
- β Basic GDAL concepts
- β Databricks notebooks and clusters
Supportβ
For advanced usage questions:
- Review test cases in the repository
- File issues on GitHub