Skip to main content

Databricks Spatial Types and Functions

Databricks provides built-in spatial types and ST (spatial) and H3 (grid) functions in Databricks SQL and Databricks Runtime 17.1+. This page gives a concise overview with runnable examples.

GeoBrix is built to complement these native capabilities and bring more data onto them: use GeoBrix readers and writers to move vector/raster data in and out of Databricks GEOMETRY/GEOGRAPHY, then lean on the built-in ST and H3 functions — alongside GeoBrix RasterX and GridX (e.g. BNG) — for the analysis at scale.

Databricks Runtime required

Examples on this page are validated under docs/tests-dbr/ and require Databricks Runtime 17.1+ (or Databricks SQL with ST/H3 support). They will skip when run on open-source Spark.

References (cloud-specific Databricks docs):

  • GEOMETRY type — Euclidean X,Y (and Z); SRID or ANY · AWS | Azure | GCP
  • GEOGRAPHY type — Lat/lon; SRID 4326 only · AWS | Azure | GCP
  • ST geospatial functions — Import, export, measurements, predicates, overlay · AWS | Azure | GCP
  • H3 geospatial functions — Grid indexing with hexagons · AWS | Azure | GCP

GEOMETRY type

Databricks reference: AWS | Azure | GCP

Represents a geometric object in a coordinate reference system described by its SRID. X,Y (and optionally Z) units depend on the SRID. Use for projected coordinates or when SRID varies.

Create from WKT, GeoJSON, or WKB:

-- Create GEOMETRY from WKT (SRID 0 or specify)
SELECT st_geomfromtext('POINT(1 2)') AS geom;
SELECT st_geomfromtext('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))', 4326) AS geom;
-- Create GEOMETRY from GeoJSON (default SRID 4326)
SELECT st_geomfromgeojson('{"type":"Point","coordinates":[[1,2]]}') AS geom;
SELECT to_geometry('{"type":"Point","coordinates":[[1,2]]}') AS geom;

Export to WKB, GeoJSON, WKT:

-- Export GEOMETRY to WKB, GeoJSON, WKT
SELECT hex(st_asbinary(st_geomfromtext('POINT(1 2)'))) AS wkb_hex;
SELECT st_asgeojson(st_geomfromtext('POINT(1 2)')) AS geojson;
SELECT st_astext(st_geomfromtext('POINT(1 2)')) AS wkt;
note

About 11000 SRID values are supported.


GEOGRAPHY type

Databricks reference: AWS | Azure | GCP

Represents a geographic object; coordinates are longitude ([-180, 180]) and latitude ([-90, 90]). Only SRID 4326 (WGS 84) is allowed.

Create and export:

-- Create GEOGRAPHY from WKT/GeoJSON (SRID 4326)
SELECT st_geogfromtext('POINT(1 2)') AS geog;
SELECT st_geogfromgeojson('{"type":"Point","coordinates":[[1,2]]}') AS geog;
SELECT to_geography('{"type":"Point","coordinates":[[1,2]]}') AS geog;
-- Export GEOGRAPHY to GeoJSON, WKT
SELECT st_asgeojson(st_geogfromtext('POINT(1 2)')) AS geojson;
SELECT st_astext(st_geogfromtext('POINT(1 2)')) AS wkt;

ST geospatial functions

Databricks reference: AWS | Azure | GCP

Databricks provides ~100 ST spatial SQL functions. They operate on GEOMETRY and/or GEOGRAPHY: construct values, export to WKT/GeoJSON/WKB, measure (area, length, distance), access components, test predicates (intersects, contains), overlay (buffer, convex hull, intersection), and transform (affine, reproject by SRID).

In Python notebooks (Databricks Runtime):

# In Databricks Runtime notebooks: import ST functions
from pyspark.databricks.sql import functions as dbf

# Use dbf.st_* for GEOMETRY/GEOGRAPHY
# Example: dbf.st_area(col("geometry")), dbf.st_intersects(...)

Measurements (area, centroid, distance):

-- Area, length, centroid, distance
SELECT st_area(st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))')) AS area;
SELECT st_centroid(st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))')) AS center;
SELECT st_distance(
st_geomfromtext('POINT(0 0)'),
st_geomfromtext('POINT(3 4)')
) AS dist;

Predicates (intersects, contains):

-- Spatial predicates: intersects, contains, within
SELECT st_intersects(
st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'),
st_geomfromtext('POINT(1 1)')
) AS intersects;
SELECT st_contains(
st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'),
st_geomfromtext('POINT(1 1)')
) AS contains;

Constructors (point, buffer, convex hull):

-- Construct point, buffer, convex hull
SELECT st_point(1, 2) AS pt;
SELECT st_buffer(st_geomfromtext('POINT(0 0)'), 0.5) AS buffered;
SELECT st_convexhull(st_geomfromtext('MULTIPOINT(0 0, 1 0, 1 1)')) AS hull;

H3 geospatial functions

Databricks reference: AWS | Azure | GCP

Databricks provides 35+ H3 functions. H3 is a global hexagonal grid with 16 resolutions. Use it to index points and polygons to cell IDs, then aggregate or join on cells (semantic spatial join without geometry predicates).

In Python notebooks:

# In Databricks Runtime notebooks: import H3 functions
from pyspark.databricks.sql import functions as dbf

# Use dbf.h3_* for H3 cell IDs (BIGINT or STRING)
# Example: dbf.h3_longlatash3(lon, lat, resolution)

Index a point (e.g. NYC; resolution 7 ≈ 5 km²):

-- Index a point to H3 cell (resolution 7 ~5 km²)
SELECT h3_longlatash3(-73.99, 40.71, 7) AS h3_cell;
SELECT h3_longlatash3string(-73.99, 40.71, 7) AS h3_str;

Polyfill (cells contained by a polygon):

-- Polyfill: get H3 cells contained by a polygon
SELECT h3_polyfillash3(
st_geogfromgeojson('{"type":"Polygon","coordinates":[[[-74.02,40.70],[-73.98,40.70],[-73.98,40.74],[-74.02,40.74],[-74.02,40.70]]]}'),
7
) AS cells;

Neighbors (k-ring, hex ring):

-- K-ring: cells within distance k of a cell
SELECT h3_kring(h3_longlatash3(-73.99, 40.71, 7), 1) AS ring;
SELECT h3_hexring(h3_longlatash3(-73.99, 40.71, 7), 1) AS hex_ring;

Cell boundary and center (WKT/GeoJSON):

-- Get H3 cell boundary as WKT/GeoJSON
SELECT h3_boundaryaswkt(h3_longlatash3(-73.99, 40.71, 7)) AS boundary_wkt;
SELECT h3_centerasgeojson(h3_longlatash3(-73.99, 40.71, 7)) AS center_geojson;
note

H3 expressions require Photon-enabled clusters or Databricks SQL (Pro/Serverless). The H3 library is included in Databricks Runtime 11.2+.


Combining with GeoBrix

  • Readers: Use GeoBrix readers (GDAL, GeoJSON, Shapefile, etc.) to load vector/raster data; then use st_geomfromwkb(geom_0) or similar to get Databricks GEOMETRY/GEOGRAPHY for ST or H3.
  • Writers: Use GeoBrix writers to export back out — the light *_gbx vector writers (GeoJSON, Shapefile, GeoPackage, GeoDatabase) plus raster and PMTiles. To write a Databricks GEOMETRY/GEOGRAPHY column, coerce it to the writer's WKB + SRID input first with st_aswkb(geom) and st_srid(geom) (see the Vector Writer schema).
  • GridX: Discrete global-grid indexing beyond H3 — GridX functions cover British National Grid (bng_polyfill, bng_tessellate, bng_pointascell, k-ring/k-loop neighbors and explode variants, cell area/distance, cell set ops with _agg aggregates), plus Quadbin and custom user-defined grids (point→cell, polyfill, tessellate, k-ring, cell geometry/centroid). Use these for BNG or bespoke grids; for a global hex grid use Databricks H3 — and GeoBrix raster→grid aggregation targets both H3 and Quadbin.
  • RasterX: A full raster toolkit (100+ functions) for the raster workloads Databricks-native spatial doesn't cover — see Raster Functions: I/O and tiling (GDAL/GeoTIFF, maketiles/retile/tooverlappingtiles, overviews, resample/warp/clip), band math (NDVI, frombands, combine, merge), terrain (slope, hillshade, aspect), focal filters and viewshed, and vector↔raster (rasterize, polygonize, contour, DTM-from-geometries, grid-from-points). Aggregate rasters onto grid cells with rst_h3_rastertogrid* / rst_quadbin_rastertogrid*, then combine with native ST/H3 for zonal stats in SQL.
  • VectorX: Augments native ST — vector-tile (MVT) encoding (st_asmvt, st_asmvt_pyramid) for serving web tiles, TIN surface modeling and elevation interpolation (st_triangulate, st_interpolateelevationgeom), and legacy-geometry migration (st_legacyaswkb). Pair with Databricks GEOMETRY and the PMTiles writer to publish maps.