Skip to main content

Shapefile Reader

Read ESRI Shapefile format using the shapefile reader.

Format Name

shapefile_ogr

Supported Files

  • .shp - Standard shapefile (requires .shx, .dbf files)
  • .zip - ZIP files containing shapefiles
  • Directories with multiple shapefiles

Basic Usage

Python

# Read shapefile (sample-data Volumes path)
df = spark.read.format("shapefile_ogr").load("{SAMPLE_SHAPEFILE_PATH}")
df.show()
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+

Scala

val df = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+

SQL

-- Read shapefile in SQL (sample-data Volumes path)
SELECT * FROM shapefile_ogr.`{SAMPLE_SHAPEFILE_PATH}`;
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+

Options

For the full list of options (chunkSize, driverName, layerN, asWKB, etc.), see OGR reader options.

Chunk Size

Adjust records per chunk for performance:

# Adjust chunk size (sample-data Volumes path)
df = spark.read.format("shapefile_ogr") \
.option("chunkSize", "50000") \
.load("{SAMPLE_SHAPEFILE_PATH}")
df.show()
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+

Next Steps