GeoPackage Reader

The GeoPackage reader provides support for reading OGC GeoPackage format, a modern SQLite-based geospatial format.

Format Name

gpkg_ogr

Overview

This is a named OGR Reader that uses the GPKG driver. GeoPackage is an open, standards-based, platform-independent, portable, self-describing format for transferring geospatial information.

Key Features

Self-contained: Single-file SQLite database
Multi-layer: Can contain multiple vector layers and raster tiles
Attributes: Full attribute support with data types
Spatial Index: Built-in spatial indexing
Portable: Cross-platform compatibility

Basic Usage

Python

# Read GeoPackage (sample-data Volumes path)
df = spark.read.format("gpkg_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
df.show()

Example output
+--------------------+--------------+---------+
|shape               |shape_srid    |BoroName |
+--------------------+--------------+---------+
|[BINARY]            |4326          |Manhattan|
|...                 |...           |...      |
+--------------------+--------------+---------+

Scala

val df = spark.read.format("gpkg_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")

Example output
+--------------------+--------------+---------+
|shape               |shape_srid    |BoroName |
+--------------------+--------------+---------+
|[BINARY]            |4326          |Manhattan|
|...                 |...           |...      |
+--------------------+--------------+---------+

SQL

-- Read GeoPackage in SQL (sample-data Volumes path)
SELECT * FROM gpkg_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg`;

Example output
+--------------------+--------------+---------+
|shape               |shape_srid    |BoroName |
+--------------------+--------------+---------+
|[BINARY]            |4326          |...      |
|...                 |...           |...      |
+--------------------+--------------+---------+

Output Schema

The output maintains attribute columns and adds geometry columns. Note that GeoPackage typically uses shape as the geometry column name:

root
 |-- shape: binary (geometry in WKB format)
 |-- shape_srid: integer (spatial reference ID)
 |-- shape_srid_proj: string (projection definition)
 |-- <attribute_columns>: various types

Options

Multi-Layer Support

GeoPackage files can contain multiple layers. Use the layerName option to specify which layer to read:

# Read specific layer (sample-data Volumes path)
boroughs = spark.read.format("gpkg_ogr") \
    .option("layerName", "boroughs") \
    .load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/geopackage/nyc_complete.gpkg")
boroughs.show()

Example output
+--------------------+--------------+---------+
|shape               |shape_srid    |BoroName |
+--------------------+--------------+---------+
|[BINARY]            |4326          |...      |
|...                 |...           |...      |
+--------------------+--------------+---------+

Other Options

All OGR reader options are available, e.g.:

chunkSize - Records per chunk (default: "10000")
asWKB - Output as WKB vs WKT (default: "true")
layerName - Specific layer to read
layerN - Layer index to read (0-based)

Format Name​

Overview​

Key Features​

Basic Usage​

Python​

Scala​

SQL​

Output Schema​

Options​

Multi-Layer Support​

Other Options​

Next Steps​