GeoTIFF Writer
Write GeoTIFF tiles. The lightweight gtiff_gbx writer (the raster_gbx
writer with the GeoTIFF driver forced) and the heavyweight gtiff_gdal writer
(the GDAL writer restricted to the GTiff driver) are interchangeable — see
Choosing an Execution Tier.
The lightweight (*_gbx) writers need no JAR or init script and are the only option
on Serverless, standard (shared), and ARM clusters. The heavyweight raster/PMTiles writers
require a classic x86 cluster (JAR + GDAL init script); where available they use native
GDAL on the JVM. So your compute usually decides the tier — then data scale. See the
Benchmarking page for timings and methodology.
Schema
Input schema — exactly (source, tile):
root
|-- source: string
|-- tile: struct
| |-- cellid: bigint
| |-- raster: binary
| |-- metadata: map<string,string>
This is the exact schema the raster readers emit. The writer requires these two columns and nothing else — extra OR missing columns both raise an error (there is no implicit projection). The on-disk format comes from tile.metadata (the GDAL driver/extension recorded at read time); the ext option controls only the filename suffix.
Typically you write a reader's output unchanged. To control output filenames, use the nameCol option — it overwrites the source value in place; do not add a column (that breaks the exact-schema check):
# (source, tile) straight from a reader -> write as-is
df.write.format("gtiff_gbx").mode("append").save("/Volumes/cat/sch/vol/out")
# name outputs from an existing column (overwrite `source`, don't add a column):
df.withColumn("source", df["scene_id"]) \
.write.format("gtiff_gbx").mode("append").save("/Volumes/cat/sch/vol/out")
Output: one GeoTIFF (.tif) file per input row, written under the target directory.
- Lightweight · gtiff_gbx
- Heavyweight · gtiff_gdal
gtiff_gbx is the raster_gbx catch-all writer with the GeoTIFF
driver forced — output is always GeoTIFF regardless of tile.metadata. Register
the lightweight DataSources first (see
Lightweight Raster Writer).
# Read then write GeoTIFF tiles (lightweight)
from databricks.labs.gbx.ds.register import register
register(spark)
df = spark.read.format("raster_gbx").load("{SAMPLE_RASTER_PATH}")
df.write.format("gtiff_gbx").mode("overwrite").save(OUT_DIR)
nameCol / ext and the performance characteristics are the same as the
catch-all raster_gbx writer.
It is the lightweight counterpart of the heavyweight gtiff_gdal writer, supporting Python and SQL bindings (not Scala).
gtiff_gdal is the GDAL writer restricted to the GeoTIFF driver — it reads and
writes .tif rasters and is append-only (use the lightweight writer for
overwrite). It takes the same (source, tile) schema as the other writers.
# Named GeoTIFF writer (gtiff_gdal = gdal writer with driver preset)
spark.read.format("gtiff_gdal").load(SAMPLE_RASTER_PATH) \
.write.format("gtiff_gdal").mode("append").option("ext", "tif").save(OUT_DIR)
See the Raster Writer → Heavyweight tab for the full
GDAL-writer options (path / nameCol / ext, format & compression from
tile.metadata).