GeoJSON Writer

geojson_gbx — pyogrio-backed, pure-Python DataSource V2 writer with the OGR driver preset to GeoJSON. It round-trips with the matching geojson_gbx reader; the schema (attrs..., geom_0, geom_0_srid, geom_0_srid_proj) is shared across all lightweight vector readers and writers.

Compute & scale

Large datasets: this writer emits a single merged GeoJSON file assembled on the driver; for very large outputs, use a classic cluster (high-memory Serverless does not help here) or switch to the scalable geojsonl_gbx writer. See Choosing a writer for large datasets.
Tier & compute: the lightweight (*_gbx) writers need no JAR or init script and are the only option on Serverless, standard (shared), and ARM clusters. The heavyweight raster/PMTiles writers require a classic x86 cluster (JAR + GDAL init script). Your compute usually decides the tier — then data scale. See Benchmarking for timings and methodology.

Before you write

Register first — call register(spark) once before using any *_gbx format (see the Writers Overview).
Lightweight-only — this single-file writer isn't implemented in the heavyweight tier; for output in both tiers, use the sharded GeoJSONL writer (geojsonl_gbx lightweight / geojsonl heavyweight).
Input geometry — provide geometry as WKB (binary) or WKT (text), with the CRS in the companion *_srid / *_srid_proj columns. EWKB, EWKT, and GeoJSON-encoded geometry are not accepted. Writing from a Databricks GEOMETRY / GEOGRAPHY column? Export to WKB first with ST_AsBinary (or ST_AsText for WKT) — see the Schema section below — and avoid ST_GeomAsEWKB / ST_AsEWKT / ST_AsGeoJSON.

Options

Option	Default	Behavior
`driverName`	`GeoJSON` (preset)	OGR driver. Preset by this named writer; override only if needed.
`mode`	`overwrite`	`overwrite` only; `append` is rejected.
`fileName`	(none)	Name the output unit explicitly. When set, `.save(path)` treats `path` as the parent directory (created if missing) and writes `path/<fileName>.geojson` (extension auto-completed). See Output naming.
`geometryType`	inferred from the data	Override the OGR geometry type.
`layerName`	driver default	Output layer name where supported.
`geomCol`	auto-detected from the `*_srid` column	Override the geometry column name. Locates the geometry and its SRID companion (`<geomCol>_srid`). See Named Vector Formats.
`sridCol`	`<geomCol>_srid`	Override the SRID column name. Required — supplies the CRS authority code (e.g. `"4326"`, or `"0"` if unknown).
`projCol`	`<geomCol>_srid_proj`	Override the PROJ4 column name (optional fallback CRS when `sridCol` is `"0"`).

Schema

geojson_gbx uses the shared vector column contract — a geometry column plus its *_srid (required) / *_srid_proj (optional) CRS companions; every other column becomes an attribute. Point the writer at your columns with geomCol / sridCol / projCol, or coerce to the geom_0 convention.

(df  # geometry + CRS under your own column names
 .write.format("geojson_gbx")
 .option("geomCol", "the_geom")   # WKB or WKT
 .option("sridCol", "epsg")       # REQUIRED — CRS authority code
 .mode("overwrite").save("/Volumes/cat/sch/vol/boroughs.geojson"))

For coercing an arbitrary frame to this shape and for exporting Databricks GEOMETRY / GEOGRAPHY columns, see the Vector Writer schema.

Output: a single GeoJSON .geojson text file (a FeatureCollection); the geometry is emitted as the standard GeoJSON geometry member (not a named attribute field), and all other columns become feature properties.

How it scales

Unlike a single-node pyogrio.write_* call that serializes one file on one machine, the geojson_gbx writer runs as a Spark DataSource V2 two-phase write: each partition is written concurrently by its executor to a scratch fragment, then the driver merges the fragments into one output file. The merge is sequential and rename-free, so it is safe on FUSE-mounted cloud storage (Unity Catalog Volumes, DBFS). Repartition the input to control write parallelism.

Output naming

geojson_gbx applies the standard single-file output naming contract. The canonical extension is .geojson. Rules evaluated in order:

Case	`.save(path)` / `fileName`	Resolved output
`fileName` given	`.option("fileName","boroughs").save("/out/exports")`	`/out/exports/boroughs.geojson`
No `fileName`; `path` is an existing directory	`.save("/out/exports")`	`/out/exports/exports.geojson`
No `fileName`; `path` is a stem	`.save("/out/exports/boroughs")`	`/out/exports/boroughs.geojson`

Extension auto-completion: boroughs → boroughs.geojson; boroughs.geojson → unchanged. Passing a name ending in a different recognized geo extension (e.g. .gpkg) raises a clear error.

Example

# Lightweight GeoJSON writer (pyogrio; OGR driver preset to "GeoJSON")
from databricks.labs.gbx.ds.register import register
register(spark)
src = f"{SAMPLE_DATA_BASE}/nyc/boroughs/nyc_boroughs.geojson"
df = spark.read.format("geojson_gbx").load(src)
out = "/tmp/nyc_boroughs.geojson"  # a single named output file
df.write.format("geojson_gbx").mode("overwrite").save(out)
back = spark.read.format("geojson_gbx").load(out)
assert back.count() == df.count()

Typical pipeline: export a table to a single file

Export a table of vector data to one GeoJSON file — no coalesce needed (the writer merges partitions into a single file on commit):

df = spark.table("main.geo.boroughs")  # vector data in a (Delta) table
df.write.format("geojson_gbx").mode("overwrite").save("/Volumes/main/geo/exports/boroughs.geojson")

Each partition is written concurrently, then merged into one output file. See Benchmarking for light-vs-heavy export figures.

Options​

Schema​

How it scales​

Output naming​

Example​

Typical pipeline: export a table to a single file​