File GeoDatabase Reader
Both tiers produce the same (attrs..., geom_0, geom_0_srid, geom_0_srid_proj) schema — see Choosing an Execution Tier.
The lightweight (*_gbx) and heavyweight readers emit the same schema, but your
compute usually decides the tier: the lightweight tier needs no JAR or init script
and is the only option on Serverless, standard (shared), and ARM clusters. The
heavyweight tier requires a classic x86 cluster (JAR + GDAL init script); where it is
available it uses native GDAL on the JVM and tends to pull ahead on large workloads. See
the Benchmarking page for light-vs-heavy timings and methodology.
Options
File Geodatabases contain multiple feature classes (layers); both tiers (lightweight file_gdb_gbx, heavyweight file_gdb_ogr) preset the OpenFileGDB driver and expose a feature-class selector. They take the same options.
| Option | Default | Description |
|---|---|---|
chunkSize | "10000" | Records per read batch — Arrow in-memory batching on the single per-file read, not partition splitting. |
layerNumber / layerN | "0" | Feature-class index to read (0-based, first feature class) — layerNumber (lightweight) / layerN (heavyweight). |
layerName | "" (first feature class) | Feature-class (layer) name to read; takes precedence over the index when set. |
asWKB | "true" | Output geometry as WKB (binary) vs WKT (text). |
All other OGR reader options (driverName, …) are also available.
Zipped input: a .gdb.zip (or a plain .zip wrapping a single .gdb) is read
transparently — GDAL opens it via /vsizip/, so just point .load() at the archive (or a
directory of them); no option is required. This is the layout the
file_gdb_gbx writer produces with zip=true.
Example — reading a specific feature class by name:
# Read specific feature class (sample-data Volumes path)
df = spark.read.format("file_gdb_ogr") \
.option("layerName", "NYC_Boroughs") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/filegdb/NYC_Sample.gdb.zip")
df.show()
+--------------------+--------------+---------+
|SHAPE |SHAPE_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
- Lightweight · file_gdb_gbx
- Heavyweight · file_gdb_ogr
file_gdb_gbx is the lightweight File Geodatabase reader (pyogrio-backed, no JAR). It reads ESRI File Geodatabases and emits the same schema as the heavyweight file_gdb_ogr reader.
# Lightweight File Geodatabase reader (pyogrio; no JAR)
from databricks.labs.gbx.ds.register import register
register(spark)
df = spark.read.format("file_gdb_gbx").load(SAMPLE) # (attrs..., <geom>, <geom>_srid, <geom>_srid_proj)
df.show()
It is the lightweight counterpart of the heavyweight file_gdb_ogr reader, supporting Python and SQL bindings (not Scala).
Typical pipeline: ingest into a table
The common pattern is to land File Geodatabase directories in a table for downstream analytics — on Databricks a managed table is Delta:
df = spark.read.format("file_gdb_gbx").load("/Volumes/main/geo/raw/") # a folder of .gdb directories
df.write.mode("overwrite").saveAsTable("main.geo.parcels") # Delta table on Databricks
Reading a folder fans the files across the cluster (one partition per file), so ingest scales with the data — unlike a single-node pyogrio.read_* that parses one file on one machine. See Benchmarking for light-vs-heavy ingest figures.
Read ESRI File Geodatabase (.gdb) format - the standard multi-layer geospatial format used in ArcGIS.
Format Name
file_gdb_ogr
Overview
This is a named OGR Reader that uses the OpenFileGDB driver. File Geodatabases are directory-based formats that can contain multiple feature classes (layers), making them ideal for complex geospatial datasets.
Basic Usage
Python
# Read File Geodatabase (sample-data Volumes path)
df = spark.read.format("file_gdb_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/filegdb/NYC_Sample.gdb.zip")
df.show()
+--------------------+--------------+---------+
|SHAPE |SHAPE_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
Scala
// Read File Geodatabase (.zip; sample data is distributed as NYC_Sample.gdb.zip)
|val df = spark.read.format("file_gdb_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/filegdb/NYC_Sample.gdb.zip")
+--------------------+--------------+---------+
|SHAPE |SHAPE_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
SQL
-- Read File Geodatabase in SQL (sample-data Volumes path)
SELECT * FROM file_gdb_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/filegdb/NYC_Sample.gdb.zip` LIMIT 10;
+--------------------+--------------+---------+
|SHAPE |SHAPE_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
Output Schema
File Geodatabases typically use SHAPE as the geometry column name:
root
|-- SHAPE: binary (geometry in WKB format)
|-- SHAPE_srid: integer (spatial reference ID)
|-- SHAPE_srid_proj: string (projection definition)
|-- <attribute_columns>: various types
Column names in File Geodatabases are case-insensitive.
Key Features
- Multi-Layer: Contains multiple feature classes (layers)
- Rich Attributes: Full attribute support with domains and subtypes
- Topology: Can include topology rules (read-only)
- ArcGIS Native: The standard format for ESRI ArcGIS
Use Cases
- ArcGIS Migration: Moving from ArcGIS workflows to Databricks
- Enterprise Geodata: Reading complex organizational geospatial datasets
- Multi-Layer Datasets: Working with related feature classes in one file
- Attribute-Rich Data: Preserving domains, subtypes, and relationships
Limitations
The OpenFileGDB driver provides read-only access. You cannot create, modify, or write File Geodatabases using this reader.
- Supports File Geodatabase versions 9.x and later
- Topology and relationship rules are read-only
- Some advanced ArcGIS features may not be fully supported