GeoDatabase Writer
file_gdb_gbx — a hybrid writer. It uses the pure-Python DataSource V2 framework
like the other *_gbx writers, but because pyogrio's bundled GDAL ships a read-only
OpenFileGDB driver, it encodes the .gdb via the native GDAL (osgeo.ogr) supplied
by the heavyweight GDAL init script. It round-trips with the file_gdb_gbx reader; the
schema (attrs..., geom_0, geom_0_srid, geom_0_srid_proj) is shared across the
lightweight vector readers and writers.
Writing a File Geodatabase needs the native GDAL Python bindings (osgeo) from the
heavyweight GDAL init script (see Installation) — pyogrio's bundled
GDAL ships a read-only OpenFileGDB driver. On compute with those natives, file_gdb_gbx
writes the .gdb natively; without them it raises a clear error (use gpkg_gbx /
geojson_gbx instead). FileGDB reading is lightweight-only (pyogrio reads it). The
output path must end in .gdb.
Call register(spark) once before using any *_gbx format (see the
Writers Overview).
The vector writers take a geometry column encoded as WKB (binary) or WKT (text) —
the same encodings the *_gbx readers emit (controlled by the reader's asWKB option). The
SRID/CRS is taken from the companion *_srid / *_srid_proj columns, so plain WKB or WKT is
all that is needed. Extended forms (EWKB, EWKT) and GeoJSON-encoded geometry are not
accepted as writer input.
Writing from a Databricks GEOMETRY / GEOGRAPHY column? Convert it to a supported interchange
format first with an ST export function:
use ST_AsBinary(geom) for WKB (recommended) or ST_AsText(geom) for WKT. Avoid
ST_GeomAsEWKB, ST_AsEWKT, and ST_AsGeoJSON — those encodings are not accepted as input.
file_gdb_gbx writing is the exception among the lightweight writers. Because it encodes
via the native GDAL (osgeo), writing a File Geodatabase is constrained to a classic x86
cluster carrying the heavyweight GDAL natives — it does not run on Serverless or ARM,
the same compute limits as the heavyweight tier. (FileGDB reading is fully lightweight and
runs anywhere.) For vector output on Serverless/ARM, reference the other writers. See the
Benchmarking page for timings and methodology.
Options
| Option | Default | Behavior |
|---|---|---|
driverName | OpenFileGDB (preset) | OGR driver. Preset by this named writer; override only if needed. |
mode | overwrite | overwrite only; append is rejected. |
geometryType | inferred from the data | Override the OGR geometry type. |
layerName | driver default | Output layer name where supported. |
Schema
Input schema — a geometry column plus its SRID companion:
root
|-- <geom>: binary (WKB) or string (WKT) # the geometry
|-- <geom>_srid: string # REQUIRED — CRS authority code, e.g. "4326" ("0" if unknown)
|-- <geom>_srid_proj: string # optional — PROJ4 string, used as a CRS fallback when srid is "0"
|-- ...any other columns # written as feature attributes
The writer locates the geometry as the column X that has a companion X_srid column, so the geometry column may be named anything (geom_0 by convention — what the *_gbx readers emit). <geom>_srid is required: it identifies the geometry column and supplies the CRS. The _srid / _srid_proj columns are consumed for the CRS and are not written as fields. Every other column is written as a feature attribute.
Coerce your DataFrame/table to this shape before writing:
from pyspark.sql import functions as F
df.select(
F.col("my_geom_wkb").alias("geom_0"), # geometry as WKB (or WKT)
F.lit("4326").alias("geom_0_srid"), # CRS authority code
F.lit("").alias("geom_0_srid_proj"), # optional PROJ4 fallback
"name", "population", # -> written as feature attributes
).write.format("file_gdb_gbx").mode("overwrite").save("/Volumes/cat/sch/vol/out")
From a Databricks GEOMETRY / GEOGRAPHY column. Databricks native spatial types are not
a writer input format directly — export them to WKB plus an SRID first. Use ST_AsWKB (the
equivalent of ST_AsBinary) for the geometry column and ST_SRID for the CRS, casting the
SRID to a string:
df.selectExpr(
"ST_AsWKB(my_geom) AS geom_0", # GEOMETRY/GEOGRAPHY -> WKB
"CAST(ST_SRID(my_geom) AS STRING) AS geom_0_srid", # SRID -> string
"'' AS geom_0_srid_proj",
"name", "population", # -> feature attributes
).write.format("file_gdb_gbx").mode("overwrite").save("/Volumes/cat/sch/vol/out")
See Databricks Spatial for the full ST function reference.
The geometry must be WKB or WKT (see the input-geometry note above; convert a Databricks GEOMETRY with ST_AsBinary). The written file round-trips with the matching *_gbx reader — reading it back yields (…attributes, geom_0, geom_0_srid, geom_0_srid_proj).
Output: an ESRI File Geodatabase (a .gdb directory); attributes are preserved.
How it scales
Unlike a single-node pyogrio.write_* call that serializes one file on one machine, the
file_gdb_gbx writer runs as a Spark DataSource V2 two-phase write: each partition is
written concurrently by its executor to a scratch fragment, then the driver merges the
fragments into one output file. The merge is sequential and rename-free, so it is safe on
FUSE-mounted cloud storage (Unity Catalog Volumes, DBFS). Repartition the input to control
write parallelism.
Example
# Lightweight GeoDatabase writer (pyogrio; OGR driver preset to "OpenFileGDB")
from databricks.labs.gbx.ds.register import register
register(spark)
src = f"{SAMPLE_DATA_BASE}/nyc/filegdb/NYC_Sample.gdb.zip"
df = spark.read.format("file_gdb_gbx").load(src)
out = "/tmp/NYC_Sample.gdb" # FileGDB output path must end in .gdb
df.write.format("file_gdb_gbx").mode("overwrite").save(out)
back = spark.read.format("file_gdb_gbx").load(out)
assert back.count() == df.count()
Typical pipeline: export a table to a single file
Export a table of vector data to a File Geodatabase — no coalesce needed (the writer merges partitions into a single file on commit). The output path must end in .gdb. Requires the native GDAL init script (see the hybrid warning above):
df = spark.table("main.geo.parcels") # vector data in a (Delta) table
df.write.format("file_gdb_gbx").mode("overwrite").save("/Volumes/main/geo/exports/parcels.gdb")
Each partition is written concurrently, then merged into one output file. See Benchmarking for light-vs-heavy export figures.