Skip to main content

GeoTIFF Writer

Write GeoTIFF tiles. The lightweight gtiff_gbx writer (the raster_gbx writer with the GeoTIFF driver forced) and the heavyweight gtiff_gdal writer (the GDAL writer restricted to the GTiff driver) are interchangeable — see Choosing an Execution Tier.

Benchmark & tradeoff

The lightweight (*_gbx) writers need no JAR or init script and are the only option on Serverless, standard (shared), and ARM clusters. The heavyweight raster/PMTiles writers require a classic x86 cluster (JAR + GDAL init script); where available they use native GDAL on the JVM. So your compute usually decides the tier — then data scale. See the Benchmarking page for timings and methodology.

Schema

Input schema — exactly (source, tile):

root
|-- source: string
|-- tile: struct
| |-- cellid: bigint
| |-- raster: binary
| |-- metadata: map<string,string>

This is the exact schema the raster readers emit. The writer requires these two columns and nothing else — extra OR missing columns both raise an error (there is no implicit projection). The on-disk format comes from tile.metadata (the GDAL driver/extension recorded at read time); the ext option controls only the filename suffix.

Typically you write a reader's output unchanged. To control output filenames, use the nameCol option — it overwrites the source value in place; do not add a column (that breaks the exact-schema check):

# (source, tile) straight from a reader -> write as-is
df.write.format("gtiff_gbx").mode("append").save("/Volumes/cat/sch/vol/out")

# name outputs from an existing column (overwrite `source`, don't add a column):
df.withColumn("source", df["scene_id"]) \
.write.format("gtiff_gbx").mode("append").save("/Volumes/cat/sch/vol/out")

Output: one GeoTIFF (.tif) file per input row, written under the target directory.

gtiff_gbx is the raster_gbx catch-all writer with the GeoTIFF driver forced — output is always GeoTIFF regardless of tile.metadata. Register the lightweight DataSources first (see Lightweight Raster Writer).

# Read then write GeoTIFF tiles (lightweight)
from databricks.labs.gbx.ds.register import register
register(spark)
df = spark.read.format("raster_gbx").load("{SAMPLE_RASTER_PATH}")
df.write.format("gtiff_gbx").mode("overwrite").save(OUT_DIR)

nameCol / ext and the performance characteristics are the same as the catch-all raster_gbx writer.

It is the lightweight counterpart of the heavyweight gtiff_gdal writer, supporting Python and SQL bindings (not Scala).