Shapefile Reader
Read ESRI Shapefile format using the shapefile reader.
Format Name
shapefile_ogr
Supported Files
.shp- Standard shapefile (requires .shx, .dbf files).zip- ZIP files containing shapefiles- Directories with multiple shapefiles
Basic Usage
Python
# Read shapefile (sample-data Volumes path)
df = spark.read.format("shapefile_ogr").load("{SAMPLE_SHAPEFILE_PATH}")
df.show()
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+
Scala
val df = spark.read.format("shapefile_ogr").load("/Volumes/main/default/geobrix_samples/geobrix-examples/nyc/subway/nyc_subway.shp.zip")
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+
SQL
-- Read shapefile in SQL (sample-data Volumes path)
SELECT * FROM shapefile_ogr.`{SAMPLE_SHAPEFILE_PATH}`;
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+
Options
For the full list of options (chunkSize, driverName, layerN, asWKB, etc.), see OGR reader options.
Chunk Size
Adjust records per chunk for performance:
# Adjust chunk size (sample-data Volumes path)
df = spark.read.format("shapefile_ogr") \
.option("chunkSize", "50000") \
.load("{SAMPLE_SHAPEFILE_PATH}")
df.show()
Example output
+--------------------+-----------+----+
|geom_0 |geom_0_srid|name|
+--------------------+-----------+----+
|[BINARY] |4326 |... |
|... |... |... |
+--------------------+-----------+----+