GeoJSON Reader
The GeoJSON reader provides support for reading GeoJSON and GeoJSONSeq (newline-delimited GeoJSON) formats.
Format Name
geojson_ogr
Overview
This is a named OGR Reader that intelligently switches between GeoJSON and GeoJSONSeq drivers based on the multi option.
Supported Formats
- GeoJSON (.geojson, .json) - Standard GeoJSON FeatureCollection
- GeoJSONSeq (.geojsonl, .geojsons) - Newline-delimited GeoJSON (default)
Basic Usage
Python
# Read standard GeoJSON (sample-data Volumes path)
df = spark.read.format("geojson_ogr") \
.option("multi", "false") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
df.show()
Example output
+--------------------+-----------+---------+
|geom_0 |geom_0_srid|BoroName |
+--------------------+-----------+---------+
|[BINARY] |4326 |Manhattan|
|... |... |... |
+--------------------+-----------+---------+
Scala
val df = spark.read.format("geojson_ogr")
| .option("multi", "false")
| .load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
Example output
+--------------------+-----------+---------+
|geom_0 |geom_0_srid|BoroName |
+--------------------+-----------+---------+
|[BINARY] |4326 |Manhattan|
|... |... |... |
+--------------------+-----------+---------+
SQL
-- Read GeoJSON in SQL (sample-data Volumes path)
SELECT * FROM geojson_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson`;
Example output
+--------------------+-----------+---------+
|geom_0 |geom_0_srid|BoroName |
+--------------------+-----------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+---------+
Options
multi
Default: "true"
Controls which GeoJSON driver to use:
"true"→ Uses GeoJSONSeq driver (newline-delimited, better for large files)"false"→ Uses GeoJSON driver (standard FeatureCollection format)
# Read GeoJSONSeq (newline-delimited, sample-data path)
df = spark.read.format("geojson_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojsonl")
# Or explicitly: .option("multi", "true")
df.show()
Example output
+--------------------+-----------+---------+
|geom_0 |geom_0_srid|BoroName |
+--------------------+-----------+---------+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+---------+
Other Options
All OGR reader options are available, e.g.:
chunkSize- Records per chunk (default: "10000")asWKB- Output as WKB vs WKT (default: "true")
Output Schema
The output maintains attribute columns and adds three columns for geometry:
root
|-- geom_0: binary (geometry in WKB format)
|-- geom_0_srid: integer (spatial reference ID)
|-- geom_0_srid_proj: string (projection definition)
|-- <properties>: various types (GeoJSON properties)
GeoJSON vs GeoJSONSeq
Standard GeoJSON (FeatureCollection)
Format:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": { "type": "Point", "coordinates": [0, 0] },
"properties": { "name": "Feature 1" }
}
]
}
When to use: Small to medium files, API responses, standard GeoJSON files
Read with: option("multi", "false")
GeoJSONSeq (Newline-Delimited)
Format:
{"type":"Feature","geometry":{"type":"Point","coordinates":[0,0]},"properties":{"name":"Feature 1"}}
{"type":"Feature","geometry":{"type":"Point","coordinates":[1,1]},"properties":{"name":"Feature 2"}}
When to use: Large files, streaming data, parallel processing, better Spark performance
Read with: option("multi", "true") (default) or omit the option