OGR Reader
The OGR reader provides generic support for reading vector data formats through the OGR library. This is the base reader that powers all vector format readers in GeoBrix.
Format Name
ogr
Overview
The OGR reader is a generic vector data reader that can handle any format supported by OGR/GDAL. While GeoBrix provides named readers for common formats (Shapefile, GeoJSON, GeoPackage, etc.), you can use the OGR reader directly for any available format.
Available Formats
The OGR reader can work with many OGR vector drivers, including:
- ESRI Shapefile (.shp)
- GeoJSON (.geojson, .json)
- GeoPackage (.gpkg)
- File Geodatabase (.gdb)
- KML (.kml)
- GML (.gml)
- CSV with geometry (.csv)
- PostgreSQL/PostGIS
- And 80+ more formats
Experience varies across GDAL formats. Not all formats are available by default—some require additional packages or drivers to be installed in your environment.
Basic Usage
Python
# OGR reader (sample-data Volumes path)
df = spark.read.format("ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
df.show()
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+
Scala
val df = spark.read.format("ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+
SQL
-- Read with OGR in SQL (sample-data Volumes path)
SELECT * FROM ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson`;
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+
Options
driverName
Default: Auto-detected from file extension if not specified
Explicitly specify the OGR driver to use (regardless of extension).
# Explicit driver (sample-data Volumes path)
df = spark.read.format("ogr") \
.option("driverName", "GeoJSON") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
df.show()
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+
Other Options
| Option | Default | Description |
|---|---|---|
chunkSize | "10000" | Number of records per chunk for parallel reading |
layerN | "0" | Layer index for multi-layer formats (0-based) |
layerName | "" | Layer name for multi-layer formats (overrides layerN) |
asWKB | "true" | Output geometry as WKB (binary) vs WKT (text) |
Output Schema
root
|-- geom_0: binary (geometry in WKB format)
|-- geom_0_srid: integer (spatial reference ID)
|-- geom_0_srid_proj: string (projection definition)
|-- <attribute_1>: <type> (feature attributes...)
|-- <attribute_2>: <type>
|-- ...
Databricks Integration
OGR (and named vector readers) output geometry in WKB format. To use with Databricks spatial functions, convert to GEOMETRY type. Example uses the Shapefile reader and sample-data Volumes path; the same pattern applies to any OGR-based reader.
These examples use st_geomfromwkb to convert GeoBrix WKB to Databricks GEOMETRY type.
Convert to GEOMETRY
# Convert WKB to Databricks GEOMETRY type
df = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
df_with_geom = df.select("*", expr("st_geomfromwkb(geom_0)").alias("geometry"))
SQL Example
-- Read shapefile and convert to GEOMETRY in SQL
CREATE OR REPLACE TEMP VIEW stations AS
SELECT *, st_geomfromwkb(geom_0) as geometry
FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;
SELECT name, geometry FROM stations LIMIT 10;
Named Readers vs OGR
For common formats, GeoBrix provides named readers for convenience (sample-data Volumes path):
# Named reader (recommended for common formats)
df = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
# OGR with explicit driver (same result)
df = spark.read.format("ogr").option("driverName", "ESRI Shapefile").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
When to use each:
- Named readers (shapefile, geojson, ogr_gpkg, file_gdb): Better for common formats, cleaner syntax
- OGR: Useful for less common formats or when you need OGR-specific options