Skip to main content

Advanced Usage Overview

This section covers advanced GeoBrix usage patterns for users who need to go beyond the standard out-of-the-box functionality.

What's Covered​

πŸ”§ Custom Spark UDFs​

Learn how to leverage GeoBrix's execute methods to build custom Spark User-Defined Functions (UDFs) that directly interact with GDAL operations.

Custom UDFs Guide β†’

πŸ–₯️ GDAL Command Line Integration​

Understand how to complement GeoBrix with GDAL command-line utilities for preprocessing, format conversion, and specialized operations.

GDAL CLI Guide β†’

πŸ“¦ Third-Party Library Integration​

Integrate GeoBrix with popular Python geospatial libraries like rasterio, xarray, and PDAL for extended functionality.

Library Integration Guide β†’

Understanding GeoBrix Architecture​

GeoBrix is built on top of GDAL and provides two primary interfaces:

Spark Expressions (Standard Usage)​

from databricks.labs.gbx.rasterx import functions as rx

rx.register(spark)
# Uses Spark's columnar expression engine
rasters = spark.read.format("gdal").load(_RASTER_PATH)
df = rasters.select(rx.rst_boundingbox("tile").alias("bbox"))
df.limit(3).show(truncate=40)
Example output
+--------------------+
|bbox |
+--------------------+
|POLYGON ((...)) |
|POLYGON ((...)) |
|... |
+--------------------+

Characteristics:

  • βœ… Optimized for Spark's catalyst optimizer
  • βœ… Automatic parallelization
  • βœ… Column-oriented operations
  • βœ… Best for standard workflows

Execute Methods (Advanced Usage)​

Execute Methods Example
import com.databricks.labs.gbx.rasterx.expressions.accessors.RST_BoundingBox
import org.gdal.gdal.Dataset

// Direct GDAL dataset manipulation
val bbox = RST_BoundingBox.execute(dataset)

Characteristics:

  • βœ… Direct GDAL access
  • βœ… Fine-grained control
  • βœ… Custom UDF building blocks
  • βœ… Best for specialized operations

When to Use Advanced Patterns​

Use Standard GeoBrix When:​

  • βœ… Operations fit your use case
  • βœ… Working with DataFrames
  • βœ… Need automatic parallelization
  • βœ… Want Spark optimization

Use Advanced Patterns When:​

  • πŸ”§ Need custom business logic
  • πŸ”§ Require specialized GDAL operations
  • πŸ”§ Integrating with external tools
  • πŸ”§ Building custom functions
  • πŸ”§ Need preprocessing/postprocessing
  • πŸ”§ Working with formats not fully supported

Architecture Diagram​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Your Application β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Standard API β”‚ β”‚ Custom UDFs β”‚ β”‚ GDAL CLI β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GeoBrix Layer β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Spark Expressions β”‚ β”‚ Execute Methods β”‚ β”‚
β”‚ β”‚ (eval functions) β”‚ β”‚ (direct GDAL calls) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GDAL/OGR Layer β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Drivers β”‚ β”‚ Dataset β”‚ β”‚ Band β”‚ β”‚ Geometryβ”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Typical Advanced Workflow​

  1. Read Data - Use GeoBrix readers or GDAL CLI for format conversion
  2. Preprocess - Apply GDAL utilities for reprojection, tiling, etc.
  3. Custom Processing - Build UDFs using execute methods for specialized logic
  4. Standard Processing - Use GeoBrix Spark expressions for distributed operations
  5. Integration - Leverage third-party libraries for specialized analysis
  6. Post-process - Use GDAL CLI or libraries for final formatting

Example: End-to-End Advanced Pipeline​

# 1. Preprocess with GDAL CLI (via subprocess or notebook magic)
# !gdalwarp -t_srs EPSG:4326 input.tif reprojected.tif

# 2. Read with GeoBrix
rasters = spark.read.format("gdal").load("/data/reprojected.tif")

# 3. Apply custom UDF for specialized logic
from databricks.labs.gbx.rasterx.expressions.accessors import RST_Metadata

@udf(MapType(StringType(), StringType()))
def extract_custom_metadata(tile_binary):
# Custom logic using execute methods
# (This is simplified - see Custom UDFs guide for details)
from datetime import datetime

dataset = None # load_dataset_from_binary(tile_binary)
# metadata = RST_Metadata.execute(dataset)
# Add custom processing
metadata = {}
metadata["processed_date"] = datetime.now().isoformat()
return metadata

enriched = rasters.withColumn("custom_metadata", extract_custom_metadata("tile"))

# 4. Use standard GeoBrix for distributed operations
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)

aoi_geometry = None # Placeholder
result = enriched.select(
"*",
rx.rst_boundingbox("tile").alias("bbox"),
rx.rst_clip("tile", aoi_geometry, lit(True)).alias("clipped")
)

# 5. Integrate with xarray for analysis (see Library Integration guide)
# Convert to xarray for advanced array operations
# ...

# 6. Save results (optional: result.limit(3).show() to inspect)
result.limit(3).show(truncate=30)
Example output
+----+--------------------+-------+-------+
|path|bbox |... |clipped|
+----+--------------------+-------+-------+
|... |POLYGON ((...)) |... |... |
+----+--------------------+-------+-------+

Next Steps​

Explore each advanced topic in detail:

  1. Custom UDFs - Start here to understand execute methods
  2. GDAL CLI - Learn preprocessing and postprocessing
  3. Library Integration - Connect with rasterio, xarray, PDAL

Prerequisites​

For advanced usage, you should be familiar with:

  • βœ… Basic GeoBrix usage (see Quick Start)
  • βœ… Spark DataFrames and UDFs
  • βœ… Python and/or Scala
  • βœ… Basic GDAL concepts
  • βœ… Databricks notebooks and clusters

Support​

For advanced usage questions: