GeoTIFF Reader
Read GeoTIFF raster files (.tif, .tiff) - the most common geospatial raster format.
Format Name
gtiff_gdal
Overview
This is a named GDAL Reader that presets the driver option to "GTiff". GeoTIFF is the de facto standard for geospatial raster data, combining TIFF image format with geospatial metadata.
Key Features:
- Industry standard format for geospatial rasters
- Supports multiple bands (RGB, multispectral, etc.)
- Embedded spatial reference and geotransform
- Compression options (LZW, DEFLATE, JPEG, etc.)
- Cloud-optimized variant (COG) support
Basic Usage
Python
# Read GeoTIFF file (sample-data Volumes path)
df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
Scala
val df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
SQL
-- Read GeoTIFF in SQL (sample-data Volumes path)
SELECT * FROM gtiff_gdal.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif` LIMIT 10;
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
Options
The GeoTIFF reader inherits all GDAL reader options. Common options include:
| Option | Default | Description |
|---|---|---|
readSubdatasets | "false" | Read subdatasets if present |
rasterAsGrid | "false" | Read as grid instead of tiles |
retile | "false" | Retile rasters for optimal processing |
tileSize | "256" | Tile size in pixels (if retiling) |
Example with Options
# Read GeoTIFF with options (sample-data Volumes path)
df = spark.read.format("gtiff_gdal") \
.option("readSubdatasets", "false") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
df.show()
+--------------------------------------------------+-----+
|path |tile |
+--------------------------------------------------+-----+
|/Volumes/.../nyc_sentinel2_red.tif |{...}|
+--------------------------------------------------+-----+
Output Schema
root
|-- tile: struct (GeoBrix raster tile structure)
|-- cellid: bigint (grid cell ID, nullable)
|-- raster: binary (raster file content)
|-- metadata: map<string,string> (driver, extension, etc.)
The tile column contains the complete raster data structure. See Tile Structure for detailed field descriptions.
Common Use Cases
Satellite Imagery
GeoTIFF is the primary format for satellite imagery:
- Sentinel-2: Multispectral Earth observation
- Landsat: Long-term Earth monitoring
- Planet: High-resolution daily imagery
Digital Elevation Models (DEMs)
Many elevation datasets are distributed as GeoTIFF:
- SRTM: 30m/90m global coverage
- ASTER GDEM: 30m global digital elevation
- LiDAR-derived: High-resolution terrain models
Aerial Photography
Orthophotos and aerial surveys commonly use GeoTIFF:
- RGB imagery
- Near-infrared (NIR) bands
- Thermal imagery
GeoTIFF vs GDAL Reader
When to use each (sample-data Volumes path):
# GeoTIFF reader (recommended for .tif files)
df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
# GDAL reader (same result, explicit driver)
df = spark.read.format("gdal").option("driver", "GTiff").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
Use gtiff_gdal when:
- ✅ Working primarily with GeoTIFF files
- ✅ Want cleaner, more readable code
- ✅ Following GeoBrix naming conventions
Use gdal when:
- ✅ Working with multiple raster formats
- ✅ Need format-specific driver options
- ✅ Format is not a common one with named reader
Cloud-Optimized GeoTIFF (COG)
Cloud-Optimized GeoTIFFs work seamlessly with the GeoTIFF reader. Example uses sample-data path; for cloud storage use an s3://, abfs://, or gs:// path.
# COG files read like regular GeoTIFFs (sample-data path for local)
cog_df = spark.read.format("gtiff_gdal").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/sentinel2/nyc_sentinel2_red.tif")
# For cloud: spark.read.format("gtiff_gdal").load("s3://bucket/cog-file.tif")
For cloud storage (S3, Azure Blob, GCS), use Cloud-Optimized GeoTIFF (COG) format for best performance. COGs enable efficient partial reads without downloading the entire file.
Compression Formats
GeoTIFF supports various compression options:
| Compression | Use Case | Pros | Cons |
|---|---|---|---|
| None | Quick access | Fast read/write | Large files |
| LZW | General purpose | Good compression, lossless | Moderate speed |
| DEFLATE | General purpose | Better compression, lossless | Slower than LZW |
| JPEG | RGB imagery | High compression | Lossy |
| JPEG2000 | High-quality | Very high compression | Slower |
GeoBrix reads compressed GeoTIFFs transparently. The compression format is automatically detected and handled by GDAL.
Next Steps
- GDAL Reader - Generic raster reader for all GDAL formats
- RasterX Functions - Raster processing operations
- Quick Start - Get started with GeoBrix