GeoPackage Reader
Both tiers produce the same (attrs..., geom_0, geom_0_srid, geom_0_srid_proj) schema — see Choosing an Execution Tier.
The lightweight (*_gbx) and heavyweight readers emit the same schema, but your
compute usually decides the tier: the lightweight tier needs no JAR or init script
and is the only option on Serverless, standard (shared), and ARM clusters. The
heavyweight tier requires a classic x86 cluster (JAR + GDAL init script); where it is
available it uses native GDAL on the JVM and tends to pull ahead on large workloads. See
the Benchmarking page for light-vs-heavy timings and methodology.
Options
GeoPackage files can contain multiple layers; both tiers (lightweight gpkg_gbx, heavyweight gpkg_ogr) preset the GPKG driver and expose a layer selector. They take the same options.
| Option | Default | Description |
|---|---|---|
chunkSize | "10000" | Records per read batch — Arrow in-memory batching on the single per-file read, not partition splitting. |
layerNumber / layerN | "0" | Layer index to read (0-based) — layerNumber (lightweight) / layerN (heavyweight). |
layerName | "" | Layer name to read; takes precedence over the layer index when set. |
asWKB | "true" | Output geometry as WKB (binary) vs WKT (text). |
All other OGR reader options (driverName, …) are also available.
Example — selecting a specific layer in a multi-layer GeoPackage:
# Read specific layer (sample-data Volumes path)
boroughs = spark.read.format("gpkg_ogr") \
.option("layerName", "boroughs") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
boroughs.show()
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
- Lightweight · gpkg_gbx
- Heavyweight · gpkg_ogr
gpkg_gbx is the lightweight GeoPackage reader (pyogrio-backed, no JAR). It reads OGC GeoPackage files and emits the same schema as the heavyweight gpkg_ogr reader.
# Lightweight GeoPackage reader (pyogrio; no JAR)
from databricks.labs.gbx.ds.register import register
register(spark)
df = spark.read.format("gpkg_gbx").load(SAMPLE) # (attrs..., <geom>, <geom>_srid, <geom>_srid_proj)
df.show()
It is the lightweight counterpart of the heavyweight gpkg_ogr reader, supporting Python and SQL bindings (not Scala).
Typical pipeline: ingest into a table
The common pattern is to land GeoPackage files in a table for downstream analytics — on Databricks a managed table is Delta:
df = spark.read.format("gpkg_gbx").load("/Volumes/main/geo/raw/") # a folder of .gpkg files
df.write.mode("overwrite").saveAsTable("main.geo.districts") # Delta table on Databricks
Reading a folder fans the files across the cluster (one partition per file), so ingest scales with the data — unlike a single-node pyogrio.read_* that parses one file on one machine. See Benchmarking for light-vs-heavy ingest figures.
The GeoPackage reader provides support for reading OGC GeoPackage format, a modern SQLite-based geospatial format.
Format Name
gpkg_ogr
Overview
This is a named OGR Reader that uses the GPKG driver. GeoPackage is an open, standards-based, platform-independent, portable, self-describing format for transferring geospatial information.
Key Features
- Self-contained: Single-file SQLite database
- Multi-layer: Can contain multiple vector layers and raster tiles
- Attributes: Full attribute support with data types
- Spatial Index: Built-in spatial indexing
- Portable: Cross-platform compatibility
Basic Usage
Python
# Read GeoPackage (sample-data Volumes path)
df = spark.read.format("gpkg_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
df.show()
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |Manhattan|
|... |... |... |
+--------------------+--------------+---------+
Scala
val df = spark.read.format("gpkg_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |Manhattan|
|... |... |... |
+--------------------+--------------+---------+
SQL
-- Read GeoPackage in SQL (sample-data Volumes path)
SELECT * FROM gpkg_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg`;
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
Output Schema
The output maintains attribute columns and adds geometry columns. Note that GeoPackage typically uses shape as the geometry column name:
root
|-- shape: binary (geometry in WKB format)
|-- shape_srid: integer (spatial reference ID)
|-- shape_srid_proj: string (projection definition)
|-- <attribute_columns>: various types