GeoDatabase Writer

file_gdb_gbx — a hybrid writer. It uses the pure-Python DataSource V2 framework like the other *_gbx writers, but because pyogrio's bundled GDAL ships a read-only OpenFileGDB driver, it encodes the .gdb via the native GDAL (osgeo.ogr) supplied by the heavyweight GDAL init script. It round-trips with the file_gdb_gbx reader; the schema (attrs..., geom_0, geom_0_srid, geom_0_srid_proj) is shared across the lightweight vector readers and writers.

Compute & scale

Large datasets: this writer assembles the .gdb on a single node. For very large outputs, use a classic cluster with ample memory or switch to the distributed geojsonl_gbx writer. See Choosing a writer for large datasets.
Tier & compute: file_gdb_gbx writing is the exception among the *_gbx writers — because it encodes via native GDAL (osgeo), it requires a classic x86 cluster with the heavyweight GDAL init script and does not run on Serverless or ARM. (FileGDB reading is fully lightweight and runs anywhere.) For vector output on Serverless/ARM, use gpkg_gbx, geojson_gbx, or geojsonl_gbx. See Benchmarking for timings and methodology.

Before you write

Register first — call register(spark) once before using any *_gbx format (see the Writers Overview).
Hybrid writer — uses the pure-Python DataSource V2 framework but encodes via native GDAL (osgeo.ogr), so it requires the heavyweight GDAL init script (see Installation). Without the native GDAL bindings it raises a clear error. The output path must end in .gdb.
Input geometry — provide geometry as WKB (binary) or WKT (text), with the CRS in the companion *_srid / *_srid_proj columns. EWKB, EWKT, and GeoJSON-encoded geometry are not accepted. Writing from a Databricks GEOMETRY / GEOGRAPHY column? Export to WKB first with ST_AsBinary (or ST_AsText for WKT) — see the Schema section below — and avoid ST_GeomAsEWKB / ST_AsEWKT / ST_AsGeoJSON.

Options

Option	Default	Behavior
`driverName`	`OpenFileGDB` (preset)	OGR driver. Preset by this named writer; override only if needed.
`mode`	`overwrite`	`overwrite` only; `append` is rejected.
`fileName`	(none)	Name the output unit explicitly. When set, `.save(path)` treats `path` as the parent directory (created if missing) and writes `path/<fileName>.gdb` (or `path/<fileName>.gdb.zip` when `zip=true`; extension auto-completed). See Output naming.
`zip`	`false`	When `true`, produce a single `.gdb.zip` archive (the `.gdb` directory packed at the archive root) instead of a `.gdb` directory. The output path is normalised automatically: `.save("roads")` → `roads.gdb.zip`; `.save("roads.gdb")` → `roads.gdb.zip`. The reader opens `.gdb.zip` transparently. Ignored for non-FileGDB drivers.
`geometryType`	inferred from the data	Override the OGR geometry type.
`layerName`	driver default	Output layer name where supported.
`geomCol`	auto-detected from the `*_srid` column	Override the geometry column name. Locates the geometry and its SRID companion (`<geomCol>_srid`). See Named Vector Formats. This is a lightweight-tier writer (hybrid — requires native GDAL for write).
`sridCol`	`<geomCol>_srid`	Override the SRID column name. Required — supplies the CRS authority code (e.g. `"4326"`, or `"0"` if unknown).
`projCol`	`<geomCol>_srid_proj`	Override the PROJ4 column name (optional fallback CRS when `sridCol` is `"0"`).

Schema

file_gdb_gbx uses the shared vector column contract — a geometry column plus its *_srid (required) / *_srid_proj (optional) CRS companions; every other column becomes an attribute. Point the writer at your columns with geomCol / sridCol / projCol, or coerce to the geom_0 convention.

(df  # geometry + CRS under your own column names
 .write.format("file_gdb_gbx")
 .option("geomCol", "the_geom")   # WKB or WKT
 .option("sridCol", "epsg")       # REQUIRED — CRS authority code
 .option("projCol", "crs_proj")   # optional — PROJ4 fallback
 .mode("overwrite").save("/Volumes/cat/sch/vol/parcels.gdb"))

For coercing an arbitrary frame to this shape and for exporting Databricks GEOMETRY / GEOGRAPHY columns, see the Vector Writer schema.

Output: an ESRI File Geodatabase (a .gdb directory, or a single .gdb.zip with zip=true — see below); the geometry is stored in a named field (default SHAPE). This is the hybrid writer — it encodes via native GDAL (osgeo.ogr), so the heavyweight GDAL init script is required (Esri field name and type limits apply).

Producing a portable ZIP archive

Pass zip=true to write a single self-contained .gdb.zip instead of a .gdb directory — the .gdb folder is packed at the archive root, the layout GDAL expects. The output path is normalised regardless of the extension you supply — .save("roads"), .save("roads.gdb"), and .save("roads.gdb.zip") all land at roads.gdb.zip:

(df.write.format("file_gdb_gbx")
   .option("zip", "true")
   .mode("overwrite").save("/Volumes/cat/sch/vol/parcels"))   # -> parcels.gdb.zip

The file_gdb_gbx / file_gdb_ogr reader opens a .gdb.zip directly (via GDAL's /vsizip/), so the archive round-trips with no extra step.

How it scales

Unlike a single-node pyogrio.write_* call that serializes one file on one machine, the file_gdb_gbx writer runs as a Spark DataSource V2 two-phase write: each partition is written concurrently by its executor to a scratch fragment, then the driver merges the fragments into one output file. The merge is sequential and rename-free, so it is safe on FUSE-mounted cloud storage (Unity Catalog Volumes, DBFS). Repartition the input to control write parallelism.

Output naming

file_gdb_gbx applies the standard single-file output naming contract. The canonical extension is .gdb (or .gdb.zip when zip=true). Rules evaluated in order:

Case	`.save(path)` / `fileName`	Resolved output
`fileName` given	`.option("fileName","parcels").save("/out/exports")`	`/out/exports/parcels.gdb`
No `fileName`; `path` is an existing directory	`.save("/out/exports")`	`/out/exports/exports.gdb`
No `fileName`; `path` is a stem	`.save("/out/exports/parcels")`	`/out/exports/parcels.gdb`

With zip=true replace .gdb with .gdb.zip in the table above. Extension auto-completion: parcels → parcels.gdb; parcels.gdb → unchanged; parcels.gdb.zip → unchanged (when zip=true). Passing a name ending in a different recognized geo extension (e.g. .gpkg) raises a clear error.

Example

# Lightweight GeoDatabase writer (pyogrio; OGR driver preset to "OpenFileGDB")
from databricks.labs.gbx.ds.register import register
register(spark)
src = f"{SAMPLE_DATA_BASE}/nyc/filegdb/NYC_Sample.gdb.zip"
df = spark.read.format("file_gdb_gbx").load(src)
out = "/tmp/NYC_Sample.gdb"  # FileGDB output path must end in .gdb
df.write.format("file_gdb_gbx").mode("overwrite").save(out)
back = spark.read.format("file_gdb_gbx").load(out)
assert back.count() == df.count()

Typical pipeline: export a table to a single file

Export a table of vector data to a File Geodatabase — no coalesce needed (the writer merges partitions into a single file on commit). The output path must end in .gdb. Requires the native GDAL init script (see the hybrid warning above):

df = spark.table("main.geo.parcels")  # vector data in a (Delta) table
df.write.format("file_gdb_gbx").mode("overwrite").save("/Volumes/main/geo/exports/parcels.gdb")

Each partition is written concurrently, then merged into one output file. See Benchmarking for light-vs-heavy export figures.

Options​

Schema​

Producing a portable ZIP archive​

How it scales​

Output naming​

Example​

Typical pipeline: export a table to a single file​