Skip to main content

GeoTIFF Reader

Read GeoTIFF raster files (.tif, .tiff) - the most common geospatial raster format.

Format Name

gtiff_gdal

Overview

This is a named GDAL Reader that presets the driver option to "GTiff". GeoTIFF is the de facto standard for geospatial raster data, combining TIFF image format with geospatial metadata.

Key Features:

  • Industry standard format for geospatial rasters
  • Supports multiple bands (RGB, multispectral, etc.)
  • Embedded spatial reference and geotransform
  • Compression options (LZW, DEFLATE, JPEG, etc.)
  • Cloud-optimized variant (COG) support

Basic Usage

Python

# Read GeoTIFF file (sample-data Volumes path)
df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
Example output
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+

Scala

val df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
Example output
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+

SQL

-- Read GeoTIFF in SQL (sample-data Volumes path)
SELECT * FROM gtiff_gdal.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif` LIMIT 10;
Example output
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+

Options

The GeoTIFF reader inherits all GDAL reader options. Common options include:

OptionDefaultDescription
readSubdatasets"false"Read subdatasets if present
rasterAsGrid"false"Read as grid instead of tiles
retile"false"Retile rasters for optimal processing
tileSize"256"Tile size in pixels (if retiling)

Example with Options

# Read GeoTIFF with options (sample-data Volumes path)
df = spark.read.format("gtiff_gdal") \
.option("readSubdatasets", "false") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
Example output
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+

Output Schema

root
|-- tile: struct (GeoBrix raster tile structure)
|-- cellid: bigint (grid cell ID, nullable)
|-- raster: binary (raster file content)
|-- metadata: map<string,string> (driver, extension, etc.)

The tile column contains the complete raster data structure. See Tile Structure for detailed field descriptions.

Common Use Cases

Satellite Imagery

GeoTIFF is the primary format for satellite imagery:

  • Sentinel-2: Multispectral Earth observation
  • Landsat: Long-term Earth monitoring
  • Planet: High-resolution daily imagery

Digital Elevation Models (DEMs)

Many elevation datasets are distributed as GeoTIFF:

  • SRTM: 30m/90m global coverage
  • ASTER GDEM: 30m global digital elevation
  • LiDAR-derived: High-resolution terrain models

Aerial Photography

Orthophotos and aerial surveys commonly use GeoTIFF:

  • RGB imagery
  • Near-infrared (NIR) bands
  • Thermal imagery

GeoTIFF vs GDAL Reader

When to use each (sample-data Volumes path):

# GeoTIFF reader (recommended for .tif files)
df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")

# GDAL reader (same result, explicit driver)
df = spark.read.format("gdal").option("driver", "GTiff").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")

Use gtiff_gdal when:

  • ✅ Working primarily with GeoTIFF files
  • ✅ Want cleaner, more readable code
  • ✅ Following GeoBrix naming conventions

Use gdal when:

  • ✅ Working with multiple raster formats
  • ✅ Need format-specific driver options
  • ✅ Format is not a common one with named reader

Cloud-Optimized GeoTIFF (COG)

Cloud-Optimized GeoTIFFs work seamlessly with the GeoTIFF reader. Example uses sample-data path; for cloud storage use an s3://, abfs://, or gs:// path.

# COG files read like regular GeoTIFFs (sample-data path for local)
cog_df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
# For cloud: spark.read.format("gtiff_gdal").load("s3://bucket/cog-file.tif")
Cloud-Optimized GeoTIFF

For cloud storage (S3, Azure Blob, GCS), use Cloud-Optimized GeoTIFF (COG) format for best performance. COGs enable efficient partial reads without downloading the entire file.

Compression Formats

GeoTIFF supports various compression options:

CompressionUse CaseProsCons
NoneQuick accessFast read/writeLarge files
LZWGeneral purposeGood compression, losslessModerate speed
DEFLATEGeneral purposeBetter compression, losslessSlower than LZW
JPEGRGB imageryHigh compressionLossy
JPEG2000High-qualityVery high compressionSlower
Compression

GeoBrix reads compressed GeoTIFFs transparently. The compression format is automatically detected and handled by GDAL.

Next Steps