Databricks Spatial Types and Functions
Databricks provides built-in spatial types and ST (spatial) and H3 (grid) functions in Databricks SQL and Databricks Runtime 17.1+. This page gives a concise overview with runnable examples.
GeoBrix is built to complement these native capabilities and bring more data onto them: use GeoBrix readers and writers to move vector/raster data in and out of Databricks GEOMETRY/GEOGRAPHY, then lean on the built-in ST and H3 functions — alongside GeoBrix RasterX and GridX (e.g. BNG) — for the analysis at scale.
Examples on this page are validated under docs/tests-dbr/ and require Databricks Runtime 17.1+ (or Databricks SQL with ST/H3 support). They will skip when run on open-source Spark.
References (cloud-specific Databricks docs):
- GEOMETRY type — Euclidean X,Y (and Z); SRID or ANY · AWS | Azure | GCP
- GEOGRAPHY type — Lat/lon; SRID 4326 only · AWS | Azure | GCP
- ST geospatial functions — Import, export, measurements, predicates, overlay · AWS | Azure | GCP
- H3 geospatial functions — Grid indexing with hexagons · AWS | Azure | GCP
GEOMETRY type
Databricks reference: AWS | Azure | GCP
Represents a geometric object in a coordinate reference system described by its SRID. X,Y (and optionally Z) units depend on the SRID. Use for projected coordinates or when SRID varies.
Create from WKT, GeoJSON, or WKB:
-- Create GEOMETRY from WKT (SRID 0 or specify)
SELECT st_geomfromtext('POINT(1 2)') AS geom;
SELECT st_geomfromtext('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))', 4326) AS geom;
-- Create GEOMETRY from GeoJSON (default SRID 4326)
SELECT st_geomfromgeojson('{"type":"Point","coordinates":[[1,2]]}') AS geom;
SELECT to_geometry('{"type":"Point","coordinates":[[1,2]]}') AS geom;
Export to WKB, GeoJSON, WKT:
-- Export GEOMETRY to WKB, GeoJSON, WKT
SELECT hex(st_asbinary(st_geomfromtext('POINT(1 2)'))) AS wkb_hex;
SELECT st_asgeojson(st_geomfromtext('POINT(1 2)')) AS geojson;
SELECT st_astext(st_geomfromtext('POINT(1 2)')) AS wkt;
About 11000 SRID values are supported.
GEOGRAPHY type
Databricks reference: AWS | Azure | GCP
Represents a geographic object; coordinates are longitude ([-180, 180]) and latitude ([-90, 90]). Only SRID 4326 (WGS 84) is allowed.
Create and export:
-- Create GEOGRAPHY from WKT/GeoJSON (SRID 4326)
SELECT st_geogfromtext('POINT(1 2)') AS geog;
SELECT st_geogfromgeojson('{"type":"Point","coordinates":[[1,2]]}') AS geog;
SELECT to_geography('{"type":"Point","coordinates":[[1,2]]}') AS geog;
-- Export GEOGRAPHY to GeoJSON, WKT
SELECT st_asgeojson(st_geogfromtext('POINT(1 2)')) AS geojson;
SELECT st_astext(st_geogfromtext('POINT(1 2)')) AS wkt;
ST geospatial functions
Databricks reference: AWS | Azure | GCP
Databricks provides ~100 ST spatial SQL functions. They operate on GEOMETRY and/or GEOGRAPHY: construct values, export to WKT/GeoJSON/WKB, measure (area, length, distance), access components, test predicates (intersects, contains), overlay (buffer, convex hull, intersection), and transform (affine, reproject by SRID).
In Python notebooks (Databricks Runtime):
# In Databricks Runtime notebooks: import ST functions
from pyspark.databricks.sql import functions as dbf
# Use dbf.st_* for GEOMETRY/GEOGRAPHY
# Example: dbf.st_area(col("geometry")), dbf.st_intersects(...)
Measurements (area, centroid, distance):
-- Area, length, centroid, distance
SELECT st_area(st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))')) AS area;
SELECT st_centroid(st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))')) AS center;
SELECT st_distance(
st_geomfromtext('POINT(0 0)'),
st_geomfromtext('POINT(3 4)')
) AS dist;
Predicates (intersects, contains):
-- Spatial predicates: intersects, contains, within
SELECT st_intersects(
st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'),
st_geomfromtext('POINT(1 1)')
) AS intersects;
SELECT st_contains(
st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'),
st_geomfromtext('POINT(1 1)')
) AS contains;
Constructors (point, buffer, convex hull):
-- Construct point, buffer, convex hull
SELECT st_point(1, 2) AS pt;
SELECT st_buffer(st_geomfromtext('POINT(0 0)'), 0.5) AS buffered;
SELECT st_convexhull(st_geomfromtext('MULTIPOINT(0 0, 1 0, 1 1)')) AS hull;
H3 geospatial functions
Databricks reference: AWS | Azure | GCP
Databricks provides 35+ H3 functions. H3 is a global hexagonal grid with 16 resolutions. Use it to index points and polygons to cell IDs, then aggregate or join on cells (semantic spatial join without geometry predicates).
In Python notebooks:
# In Databricks Runtime notebooks: import H3 functions
from pyspark.databricks.sql import functions as dbf
# Use dbf.h3_* for H3 cell IDs (BIGINT or STRING)
# Example: dbf.h3_longlatash3(lon, lat, resolution)
Index a point (e.g. NYC; resolution 7 ≈ 5 km²):
-- Index a point to H3 cell (resolution 7 ~5 km²)
SELECT h3_longlatash3(-73.99, 40.71, 7) AS h3_cell;
SELECT h3_longlatash3string(-73.99, 40.71, 7) AS h3_str;
Polyfill (cells contained by a polygon):
-- Polyfill: get H3 cells contained by a polygon
SELECT h3_polyfillash3(
st_geogfromgeojson('{"type":"Polygon","coordinates":[[[-74.02,40.70],[-73.98,40.70],[-73.98,40.74],[-74.02,40.74],[-74.02,40.70]]]}'),
7
) AS cells;
Neighbors (k-ring, hex ring):
-- K-ring: cells within distance k of a cell
SELECT h3_kring(h3_longlatash3(-73.99, 40.71, 7), 1) AS ring;
SELECT h3_hexring(h3_longlatash3(-73.99, 40.71, 7), 1) AS hex_ring;
Cell boundary and center (WKT/GeoJSON):
-- Get H3 cell boundary as WKT/GeoJSON
SELECT h3_boundaryaswkt(h3_longlatash3(-73.99, 40.71, 7)) AS boundary_wkt;
SELECT h3_centerasgeojson(h3_longlatash3(-73.99, 40.71, 7)) AS center_geojson;
H3 expressions require Photon-enabled clusters or Databricks SQL (Pro/Serverless). The H3 library is included in Databricks Runtime 11.2+.
Combining with GeoBrix
- Readers: Use GeoBrix readers (GDAL, GeoJSON, Shapefile, etc.) to load vector/raster data; then use
st_geomfromwkb(geom_0)or similar to get DatabricksGEOMETRY/GEOGRAPHYfor ST or H3. - Writers: Use GeoBrix writers to export back out — the light
*_gbxvector writers (GeoJSON, Shapefile, GeoPackage, GeoDatabase) plus raster and PMTiles. To write a DatabricksGEOMETRY/GEOGRAPHYcolumn, coerce it to the writer's WKB + SRID input first withst_aswkb(geom)andst_srid(geom)(see the Vector Writer schema). - GridX: Discrete global-grid indexing beyond H3 — GridX functions cover British National Grid (
bng_polyfill,bng_tessellate,bng_pointascell, k-ring/k-loop neighbors and explode variants, cell area/distance, cell set ops with_aggaggregates), plus Quadbin and custom user-defined grids (point→cell, polyfill, tessellate, k-ring, cell geometry/centroid). Use these for BNG or bespoke grids; for a global hex grid use Databricks H3 — and GeoBrix raster→grid aggregation targets both H3 and Quadbin. - RasterX: A full raster toolkit (100+ functions) for the raster workloads Databricks-native spatial doesn't cover — see Raster Functions: I/O and tiling (GDAL/GeoTIFF,
maketiles/retile/tooverlappingtiles, overviews,resample/warp/clip), band math (NDVI,frombands,combine,merge), terrain (slope,hillshade,aspect), focal filters andviewshed, and vector↔raster (rasterize,polygonize,contour, DTM-from-geometries, grid-from-points). Aggregate rasters onto grid cells withrst_h3_rastertogrid*/rst_quadbin_rastertogrid*, then combine with native ST/H3 for zonal stats in SQL. - VectorX: Augments native ST — vector-tile (MVT) encoding (
st_asmvt,st_asmvt_pyramid) for serving web tiles, TIN surface modeling and elevation interpolation (st_triangulate,st_interpolateelevationgeom), and legacy-geometry migration (st_legacyaswkb). Pair with DatabricksGEOMETRYand the PMTiles writer to publish maps.