GeoJSONL Writer (multi-file)

Emit a directory of newline-delimited GeoJSONL shards (OGR driver GeoJSONSeq, one Feature per line) — one shard per partition, no driver merge. Unlike the single-file GeoJSON writer — which merges every partition into one FeatureCollection file on the driver — the GeoJSONL writer's shards are the dataset, so write throughput scales with partitions. An optional maxRecordsPerFile splits a partition into several shards.

Both tiers take the same input shape and produce a directory of part-<uuid>.geojsonl shards that round-trips with the GeoJSON reader's directory mode:

Lightweight geojsonl_gbx — pure-Python, Serverless-safe (pyogrio); read back via geojson_gbx with option("multi","true").
Heavyweight geojsonl_ogr — DataSource V2 writer on native GDAL/OGR (JVM); read back via geojson_ogr with option("multi","true").

Compute & scale

Large datasets: GeoJSONL is the recommended writer for large data — one shard per partition, no driver assembly, write throughput scales with partitions. See Choosing a writer for large datasets for a comparison with the single-file writers.
Tier & compute: the lightweight geojsonl_gbx writer needs no JAR or init script and is the only option on Serverless, standard (shared), and ARM clusters. The heavyweight geojsonl_ogr writer requires a classic x86 cluster with the GeoBrix JAR + GDAL init script; it encodes each shard with native GDAL/OGR on the JVM. Your compute usually decides the tier. See Benchmarking for timings and methodology.

Before you write

Register first — call register(spark) once before using any *_gbx format (see the Writers Overview).
Both tiers — geojsonl_gbx (lightweight) and geojsonl_ogr (heavyweight) take the same input shape and produce the same output layout; the format name is the only difference.
Input geometry — provide geometry as WKB (binary) or WKT (text), with the CRS in the companion *_srid / *_srid_proj columns. EWKB, EWKT, and GeoJSON-encoded geometry are not accepted. Writing from a Databricks GEOMETRY / GEOGRAPHY column? Export to WKB first with ST_AsBinary (or ST_AsText for WKT) — see the Schema section below — and avoid ST_GeomAsEWKB / ST_AsEWKT / ST_AsGeoJSON.

Options

Option	Default	Behavior
`maxRecordsPerFile`	unset (one shard per partition)	If set, split each partition into multiple shards of at most this many features.
`mode`	`overwrite`	`overwrite` only; `append` is rejected. The target directory is cleared once before the shards land.
`geometryType`	inferred from the data	Override the OGR geometry type.
`layerName`	driver default	Output layer name where supported.
`geomCol`	auto-detected from the `*_srid` column	Override the geometry column name. Locates the geometry and its SRID companion (`<geomCol>_srid`). See Named Vector Formats. Available in both lightweight and heavyweight tiers.
`sridCol`	`<geomCol>_srid`	Override the SRID column name. Required — supplies the CRS authority code (e.g. `"4326"`, or `"0"` if unknown). Available in both tiers.
`projCol`	`<geomCol>_srid_proj`	Override the PROJ4 column name (optional fallback CRS when `sridCol` is `"0"`). Available in both tiers.

Schema

geojsonl_gbx / geojsonl_ogr use the shared vector column contract — a geometry column plus its *_srid (required) / *_srid_proj (optional) CRS companions; every other column becomes a feature attribute. Point the writer at your columns with geomCol / sridCol / projCol, or coerce to the geom_0 convention.

Output: a directory (out/) containing one part-<uuid>.geojsonl shard per non-empty partition (plus an advisory _SUCCESS marker). Each shard is a sequence of newline-delimited GeoJSON Feature objects; attributes are written as feature properties.

Write it

Lightweight · geojsonl_gbx
Heavyweight · geojsonl_ogr

Name your geometry/CRS columns with options (or coerce to the geom_0 convention — see the Vector Writer schema for that and for exporting Databricks GEOMETRY / GEOGRAPHY columns), then write:

(df  # geometry + CRS under your own column names
 .write.format("geojsonl_gbx")
 .option("geomCol", "the_geom")   # WKB or WKT
 .option("sridCol", "epsg")       # REQUIRED — CRS authority code
 .mode("overwrite").save("/Volumes/cat/sch/vol/out"))

Read it back with the GeoJSON reader's directory mode — multi=true enumerates the .geojsonl shards and parses each as a GeoJSONSeq sequence:

back = spark.read.format("geojson_gbx").option("multi", "true").load("/Volumes/cat/sch/vol/out")

Same input shape and output layout; the format name is geojsonl_ogr and reads back through the heavyweight GeoJSON reader (geojson_ogr). Requires a classic x86 cluster with the GeoBrix JAR and GDAL init script. The same column options apply (see the Vector Writer schema):

(df  # geometry + CRS under your own column names
 .write.format("geojsonl_ogr")
 .option("geomCol", "the_geom")   # WKB or WKT
 .option("sridCol", "epsg")       # REQUIRED — CRS authority code
 .mode("overwrite").save("/Volumes/cat/sch/vol/out"))

Read it back with the heavyweight GeoJSON reader's directory mode:

back = spark.read.format("geojson_ogr").option("multi", "true").load("/Volumes/cat/sch/vol/out")

Each shard is encoded with native GDAL/OGR (GeoJSONSeq driver) to worker-local temp, then sequentially copied into the output directory (rename-free, so safe on FUSE-mounted Unity Catalog Volumes / DBFS).

How it scales

The single-file GeoJSON writer writes one merged file: each partition writes a fragment, then the driver concatenates all fragments into a single FeatureCollection. That driver-side merge is sequential — fine for "one file for the table", but a single-node bottleneck at scale.

The GeoJSONL writer writes one shard per partition with no driver merge. Each executor encodes its partition to a worker-local GeoJSONL shard and sequentially copies it into the output directory (rename-free, so safe on FUSE-mounted Unity Catalog Volumes / DBFS). Because GeoJSONL is splittable and concatenable, the directory of shards is the dataset — there is nothing to assemble on the driver, so write throughput scales with the number of partitions. Use df.repartition(n, "<col>") to set shard granularity, or maxRecordsPerFile to cap features per shard (splitting a large partition into several shards). This holds for both tiers — the lightweight writer parallelizes per partition with pyogrio, the heavyweight writer with native GDAL/OGR on the JVM.

Serverless: repartition by a column

On Serverless, hash by a column — df.repartition(n, "<key>") — to control parallelism. A number-only df.repartition(n) (round-robin) is coalesced by AQE back toward a single partition on small data (= serial write), and AQE can't be disabled there. Hashing by any high-cardinality column (an id/key) is respected. On classic clusters either form works.

Choose the single-file GeoJSON writer when you want one file; choose GeoJSONL for large, streaming, or highly parallel writes.

Typical pipeline: export a table to a sharded directory

Export a table of vector data to a directory of GeoJSONL shards — repartition to control write parallelism (one shard per partition):

Step	Code (lightweight)
1. Load a table	`df = spark.table("main.geo.parcels")`
2. Set parallelism	`df = df.repartition(64, "id")` (by a column — see Serverless note above)
3. Write sharded	`df.write.format("geojsonl_gbx").mode("overwrite").save("/Volumes/main/geo/exports/parcels")`
4. Read back	`spark.read.format("geojson_gbx").option("multi","true").load("/Volumes/main/geo/exports/parcels")`

df = spark.table("main.geo.parcels")          # vector data in a (Delta) table
df.repartition(64, "id").write.format("geojsonl_gbx").mode("overwrite").save(
    "/Volumes/main/geo/exports/parcels"            # hash by a column (Serverless-safe; see note above)
)
# -> a directory of 64 part-<uuid>.geojsonl shards, written in parallel, no driver merge

On a classic x86 cluster, swap the format names for the heavyweight tier (geojsonl_ogr to write, geojson_ogr with multi=true to read back).

To cap features per shard regardless of partition size:

df.write.format("geojsonl_gbx").mode("overwrite").option(
    "maxRecordsPerFile", "50000"
).save("/Volumes/main/geo/exports/parcels")

Next Steps

GeoJSON Writer — the single-file (FeatureCollection) writer.
Vector Writer — the generic OGR writer (any driver).
Writers Overview — all writers, register-first, and benchmarks.
Readers Overview — the corresponding read paths.

Options​

Schema​

Write it​

How it scales​

Typical pipeline: export a table to a sharded directory​

Next Steps​