Skip to main content

OGR Reader

The OGR reader provides generic support for reading vector data formats through the OGR library. This is the base reader that powers all vector format readers in GeoBrix.

Format Name

ogr

Overview

The OGR reader is a generic vector data reader that can handle any format supported by OGR/GDAL. While GeoBrix provides named readers for common formats (Shapefile, GeoJSON, GeoPackage, etc.), you can use the OGR reader directly for any available format.

Available Formats

The OGR reader can work with many OGR vector drivers, including:

  • ESRI Shapefile (.shp)
  • GeoJSON (.geojson, .json)
  • GeoPackage (.gpkg)
  • File Geodatabase (.gdb)
  • KML (.kml)
  • GML (.gml)
  • CSV with geometry (.csv)
  • PostgreSQL/PostGIS
  • And 80+ more formats
Format Availability

Experience varies across GDAL formats. Not all formats are available by default—some require additional packages or drivers to be installed in your environment.

Basic Usage

Python

# OGR reader (sample-data Volumes path)
df = spark.read.format("ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
df.show()
Example output
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+

Scala

val df = spark.read.format("ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
Example output
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+

SQL

-- Read with OGR in SQL (sample-data Volumes path)
SELECT * FROM ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson`;
Example output
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+

Options

driverName

Default: Auto-detected from file extension if not specified

Explicitly specify the OGR driver to use (regardless of extension).

# Explicit driver (sample-data Volumes path)
df = spark.read.format("ogr") \
.option("driverName", "GeoJSON") \
.load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/boroughs/nyc_boroughs.geojson")
df.show()
Example output
+--------------------+-----------+-----+
|geom_0 |geom_0_srid|... |
+--------------------+-----------+-----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+-----+

Other Options

OptionDefaultDescription
chunkSize"10000"Number of records per chunk for parallel reading
layerN"0"Layer index for multi-layer formats (0-based)
layerName""Layer name for multi-layer formats (overrides layerN)
asWKB"true"Output geometry as WKB (binary) vs WKT (text)

Output Schema

root
|-- geom_0: binary (geometry in WKB format)
|-- geom_0_srid: integer (spatial reference ID)
|-- geom_0_srid_proj: string (projection definition)
|-- <attribute_1>: <type> (feature attributes...)
|-- <attribute_2>: <type>
|-- ...

Databricks Integration

OGR (and named vector readers) output geometry in WKB format. To use with Databricks spatial functions, convert to GEOMETRY type. Example uses the Shapefile reader and sample-data Volumes path; the same pattern applies to any OGR-based reader.

Requires Databricks Runtime

These examples use st_geomfromwkb to convert GeoBrix WKB to Databricks GEOMETRY type.

Convert to GEOMETRY

# Convert WKB to Databricks GEOMETRY type
df = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
df_with_geom = df.select("*", expr("st_geomfromwkb(geom_0)").alias("geometry"))

SQL Example

-- Read shapefile and convert to GEOMETRY in SQL
CREATE OR REPLACE TEMP VIEW stations AS
SELECT *, st_geomfromwkb(geom_0) as geometry
FROM shapefile_ogr.`/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip`;

SELECT name, geometry FROM stations LIMIT 10;

Named Readers vs OGR

For common formats, GeoBrix provides named readers for convenience (sample-data Volumes path):

# Named reader (recommended for common formats)
df = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
# OGR with explicit driver (same result)
df = spark.read.format("ogr").option("driverName", "ESRI Shapefile").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")

When to use each:

  • Named readers (shapefile, geojson, ogr_gpkg, file_gdb): Better for common formats, cleaner syntax
  • OGR: Useful for less common formats or when you need OGR-specific options

Next Steps