Skip to main content

GeoTIFF Reader

Read GeoTIFFs into the shared (source, tile) schema. The lightweight gtiff_gbx reader (the raster_gbx catch-all with the GeoTIFF driver preset, rasterio-backed, JAR-free) and the heavyweight gtiff_gdal reader are interchangeable — see Choosing an Execution Tier.

Benchmark & tradeoff

The lightweight (*_gbx) and heavyweight readers emit the same schema, but your compute usually decides the tier: the lightweight tier needs no JAR or init script and is the only option on Serverless, standard (shared), and ARM clusters. The heavyweight tier requires a classic x86 cluster (JAR + GDAL init script); where it is available it uses native GDAL on the JVM and tends to pull ahead on large workloads. See the Benchmarking page for light-vs-heavy timings and methodology.

Options

Both named readers preset the GeoTIFF driver and inherit their respective generic raster reader's options.

Lightweight (gtiff_gbx)

Inherits the lightweight raster_gbx options:

OptionDefaultDescription
sizeInMB"-1"Default (<= 0) = no split: one whole-image tile per file. Set a positive MB value to tile large rasters into multiple tiles.
filterRegex".*"When loading a directory, keep files whose full path matches this regex.

Heavyweight (gtiff_gdal)

Inherits all heavyweight gdal reader options. Common options include:

OptionDefaultDescription
readSubdatasets"false"Read subdatasets if present
rasterAsGrid"false"Read as grid instead of tiles
retile"false"Retile rasters for optimal processing
tileSize"256"Tile size in pixels (if retiling)

Example — reading with options set:

# Read GeoTIFF with options (sample-data Volumes path)
df = spark.read.format("gtiff_gdal") \
.option("readSubdatasets", "false") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
Example output
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+

gtiff_gbx is the raster_gbx catch-all reader with the GeoTIFF driver preset — use it to make GeoTIFF reads explicit. It is pure-Python (no JAR) and emits the same (source, tile) schema as the other readers.

Register the lightweight DataSources first (see Lightweight Raster Readers → Register).

# Named lightweight GeoTIFF reader (preset for GeoTIFF)
df = spark.read.format("gtiff_gbx").load("{SAMPLE_RASTER_PATH}")

It is the lightweight counterpart of the heavyweight gtiff_gdal reader, supporting Python and SQL bindings (not Scala).

Next Steps