GeoPackage Reader
The GeoPackage reader provides support for reading OGC GeoPackage format, a modern SQLite-based geospatial format.
Format Name
gpkg_ogr
Overview
This is a named OGR Reader that uses the GPKG driver. GeoPackage is an open, standards-based, platform-independent, portable, self-describing format for transferring geospatial information.
Key Features
- Self-contained: Single-file SQLite database
- Multi-layer: Can contain multiple vector layers and raster tiles
- Attributes: Full attribute support with data types
- Spatial Index: Built-in spatial indexing
- Portable: Cross-platform compatibility
Basic Usage
Python
# Read GeoPackage (sample-data Volumes path)
df = spark.read.format("gpkg_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
df.show()
Example output
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |Manhattan|
|... |... |... |
+--------------------+--------------+---------+
Scala
val df = spark.read.format("gpkg_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
Example output
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |Manhattan|
|... |... |... |
+--------------------+--------------+---------+
SQL
-- Read GeoPackage in SQL (sample-data Volumes path)
SELECT * FROM gpkg_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg`;
Example output
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
Output Schema
The output maintains attribute columns and adds geometry columns. Note that GeoPackage typically uses shape as the geometry column name:
root
|-- shape: binary (geometry in WKB format)
|-- shape_srid: integer (spatial reference ID)
|-- shape_srid_proj: string (projection definition)
|-- <attribute_columns>: various types
Options
Multi-Layer Support
GeoPackage files can contain multiple layers. Use the layerName option to specify which layer to read:
# Read specific layer (sample-data Volumes path)
boroughs = spark.read.format("gpkg_ogr") \
.option("layerName", "boroughs") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
boroughs.show()
Example output
+--------------------+--------------+---------+
|shape |shape_srid |BoroName |
+--------------------+--------------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+--------------+---------+
Other Options
All OGR reader options are available, e.g.:
chunkSize- Records per chunk (default: "10000")asWKB- Output as WKB vs WKT (default: "true")layerName- Specific layer to readlayerN- Layer index to read (0-based)