GeoTIFF Reader
Read GeoTIFFs into the shared (source, tile) schema. The lightweight
gtiff_gbx reader (the raster_gbx catch-all with the GeoTIFF driver preset,
rasterio-backed, JAR-free) and the heavyweight gtiff_gdal reader are
interchangeable — see Choosing an Execution Tier.
The lightweight (*_gbx) and heavyweight readers emit the same schema, but your
compute usually decides the tier: the lightweight tier needs no JAR or init script
and is the only option on Serverless, standard (shared), and ARM clusters. The
heavyweight tier requires a classic x86 cluster (JAR + GDAL init script); where it is
available it uses native GDAL on the JVM and tends to pull ahead on large workloads. See
the Benchmarking page for light-vs-heavy timings and methodology.
Options
Both named readers preset the GeoTIFF driver and inherit their respective generic raster reader's options.
Lightweight (gtiff_gbx)
Inherits the lightweight raster_gbx options:
| Option | Default | Description |
|---|---|---|
sizeInMB | "-1" | Default (<= 0) = no split: one whole-image tile per file. Set a positive MB value to tile large rasters into multiple tiles. |
filterRegex | ".*" | When loading a directory, keep files whose full path matches this regex. |
Heavyweight (gtiff_gdal)
Inherits all heavyweight gdal reader options. Common options include:
| Option | Default | Description |
|---|---|---|
readSubdatasets | "false" | Read subdatasets if present |
rasterAsGrid | "false" | Read as grid instead of tiles |
retile | "false" | Retile rasters for optimal processing |
tileSize | "256" | Tile size in pixels (if retiling) |
Example — reading with options set:
# Read GeoTIFF with options (sample-data Volumes path)
df = spark.read.format("gtiff_gdal") \
.option("readSubdatasets", "false") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
- Lightweight · gtiff_gbx
- Heavyweight · gtiff_gdal
gtiff_gbx is the raster_gbx catch-all reader with the GeoTIFF
driver preset — use it to make GeoTIFF reads explicit. It is pure-Python (no JAR)
and emits the same (source, tile) schema as the other readers.
Register the lightweight DataSources first (see Lightweight Raster Readers → Register).
# Named lightweight GeoTIFF reader (preset for GeoTIFF)
df = spark.read.format("gtiff_gbx").load("{SAMPLE_RASTER_PATH}")
It is the lightweight counterpart of the heavyweight gtiff_gdal reader, supporting Python and SQL bindings (not Scala).
Read GeoTIFF raster files (.tif, .tiff) - the most common geospatial raster format.
Format Name
gtiff_gdal
Overview
This is a named GDAL Reader that presets the driver option to "GTiff". GeoTIFF is the de facto standard for geospatial raster data, combining TIFF image format with geospatial metadata.
Key Features:
- Industry standard format for geospatial rasters
- Supports multiple bands (RGB, multispectral, etc.)
- Embedded spatial reference and geotransform
- Compression options (LZW, DEFLATE, JPEG, etc.)
- Cloud-optimized variant (COG) support
Basic Usage
Python
# Read GeoTIFF file (sample-data Volumes path)
df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
Scala
val df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
SQL
-- Read GeoTIFF in SQL (sample-data Volumes path)
SELECT * FROM gtiff_gdal.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif` LIMIT 10;
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
Output Schema
root
|-- tile: struct (GeoBrix raster tile structure)
|-- cellid: bigint (grid cell ID, nullable)
|-- raster: binary (raster file content)
|-- metadata: map<string,string> (driver, extension, etc.)
The tile column contains the complete raster data structure. See Tile Structure for detailed field descriptions.
Common Use Cases
Satellite Imagery
GeoTIFF is the primary format for satellite imagery:
- Sentinel-2: Multispectral Earth observation
- Landsat: Long-term Earth monitoring
- Planet: High-resolution daily imagery
Digital Elevation Models (DEMs)
Many elevation datasets are distributed as GeoTIFF:
- SRTM: 30m/90m global coverage
- ASTER GDEM: 30m global digital elevation
- LiDAR-derived: High-resolution terrain models
Aerial Photography
Orthophotos and aerial surveys commonly use GeoTIFF:
- RGB imagery
- Near-infrared (NIR) bands
- Thermal imagery
GeoTIFF vs GDAL Reader
When to use each (sample-data Volumes path):
# GeoTIFF reader (recommended for .tif files)
df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
# GDAL reader (same result, explicit driver)
df = spark.read.format("gdal").option("driver", "GTiff").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
Use gtiff_gdal when:
- ✅ Working primarily with GeoTIFF files
- ✅ Want cleaner, more readable code
- ✅ Following GeoBrix naming conventions
Use gdal when:
- ✅ Working with multiple raster formats
- ✅ Need format-specific driver options
- ✅ Format is not a common one with named reader
Cloud-Optimized GeoTIFF (COG)
Cloud-Optimized GeoTIFFs work seamlessly with the GeoTIFF reader. Example uses sample-data path; for cloud storage use an s3://, abfs://, or gs:// path.
# COG files read like regular GeoTIFFs (sample-data path for local)
cog_df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
# For cloud: spark.read.format("gtiff_gdal").load("s3://bucket/cog-file.tif")
For cloud storage (S3, Azure Blob, GCS), use Cloud-Optimized GeoTIFF (COG) format for best performance. COGs enable efficient partial reads without downloading the entire file.
Compression Formats
GeoTIFF supports various compression options:
| Compression | Use Case | Pros | Cons |
|---|---|---|---|
| None | Quick access | Fast read/write | Large files |
| LZW | General purpose | Good compression, lossless | Moderate speed |
| DEFLATE | General purpose | Better compression, lossless | Slower than LZW |
| JPEG | RGB imagery | High compression | Lossy |
| JPEG2000 | High-quality | Very high compression | Slower |
GeoBrix reads compressed GeoTIFFs transparently. The compression format is automatically detected and handled by GDAL.
Next Steps
- GDAL Reader - Generic raster reader for all GDAL formats
- Raster Functions - Raster processing operations
- Quick Start - Get started with GeoBrix