Databricks Spatial Types and Functions
Databricks provides built-in spatial types and ST (spatial) and H3 (grid) functions in Databricks SQL and Databricks Runtime 17.1+. This page gives a concise overview with runnable examples. Use these alongside GeoBrix for workflows that combine native Databricks geometry with GeoBrix raster/vector readers and GridX (e.g. BNG).
Examples on this page are validated under docs/tests-dbr/ and require Databricks Runtime 17.1+ (or Databricks SQL with ST/H3 support). They will skip when run on open-source Spark.
References:
- GEOMETRY type — Euclidean X,Y (and Z); SRID or ANY
- GEOGRAPHY type — Lat/lon; SRID 4326 only
- ST geospatial functions — Import, export, measurements, predicates, overlay
- H3 geospatial functions — Grid indexing with hexagons
GEOMETRY type
Represents a geometric object in a coordinate reference system described by its SRID. X,Y (and optionally Z) units depend on the SRID. Use for projected coordinates or when SRID varies.
Create from WKT, GeoJSON, or WKB:
-- Create GEOMETRY from WKT (SRID 0 or specify)
SELECT st_geomfromtext('POINT(1 2)') AS geom;
SELECT st_geomfromtext('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))', 4326) AS geom;
-- Create GEOMETRY from GeoJSON (default SRID 4326)
SELECT st_geomfromgeojson('{"type":"Point","coordinates":[[1,2]]}') AS geom;
SELECT to_geometry('{"type":"Point","coordinates":[[1,2]]}') AS geom;
Export to WKB, GeoJSON, WKT:
-- Export GEOMETRY to WKB, GeoJSON, WKT
SELECT hex(st_asbinary(st_geomfromtext('POINT(1 2)'))) AS wkb_hex;
SELECT st_asgeojson(st_geomfromtext('POINT(1 2)')) AS geojson;
SELECT st_astext(st_geomfromtext('POINT(1 2)')) AS wkt;
About 11000 SRID values are supported.
GEOGRAPHY type
Represents a geographic object; coordinates are longitude ([-180, 180]) and latitude ([-90, 90]). Only SRID 4326 (WGS 84) is allowed.
Create and export:
-- Create GEOGRAPHY from WKT/GeoJSON (SRID 4326)
SELECT st_geogfromtext('POINT(1 2)') AS geog;
SELECT st_geogfromgeojson('{"type":"Point","coordinates":[[1,2]]}') AS geog;
SELECT to_geography('{"type":"Point","coordinates":[[1,2]]}') AS geog;
-- Export GEOGRAPHY to GeoJSON, WKT
SELECT st_asgeojson(st_geogfromtext('POINT(1 2)')) AS geojson;
SELECT st_astext(st_geogfromtext('POINT(1 2)')) AS wkt;
ST geospatial functions
ST functions operate on GEOMETRY and/or GEOGRAPHY: construct values, export to WKT/GeoJSON/WKB, measure (area, length, distance), test predicates (intersects, contains), and overlay (buffer, convex hull, intersection).
In Python notebooks (Databricks Runtime):
# In Databricks Runtime notebooks: import ST functions
from pyspark.databricks.sql import functions as dbf
# Use dbf.st_* for GEOMETRY/GEOGRAPHY
# Example: dbf.st_area(col("geometry")), dbf.st_intersects(...)
Measurements (area, centroid, distance):
-- Area, length, centroid, distance
SELECT st_area(st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))')) AS area;
SELECT st_centroid(st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))')) AS center;
SELECT st_distance(
st_geomfromtext('POINT(0 0)'),
st_geomfromtext('POINT(3 4)')
) AS dist;
Predicates (intersects, contains):
-- Spatial predicates: intersects, contains, within
SELECT st_intersects(
st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'),
st_geomfromtext('POINT(1 1)')
) AS intersects;
SELECT st_contains(
st_geomfromtext('POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))'),
st_geomfromtext('POINT(1 1)')
) AS contains;
Constructors (point, buffer, convex hull):
-- Construct point, buffer, convex hull
SELECT st_point(1, 2) AS pt;
SELECT st_buffer(st_geomfromtext('POINT(0 0)'), 0.5) AS buffered;
SELECT st_convexhull(st_geomfromtext('MULTIPOINT(0 0, 1 0, 1 1)')) AS hull;
H3 geospatial functions
H3 is a global hexagonal grid with 16 resolutions. Use it to index points and polygons to cell IDs, then aggregate or join on cells (semantic spatial join without geometry predicates).
In Python notebooks:
# In Databricks Runtime notebooks: import H3 functions
from pyspark.databricks.sql import functions as dbf
# Use dbf.h3_* for H3 cell IDs (BIGINT or STRING)
# Example: dbf.h3_longlatash3(lon, lat, resolution)
Index a point (e.g. NYC; resolution 7 ≈ 5 km²):
-- Index a point to H3 cell (resolution 7 ~5 km²)
SELECT h3_longlatash3(-73.99, 40.71, 7) AS h3_cell;
SELECT h3_longlatash3string(-73.99, 40.71, 7) AS h3_str;
Polyfill (cells contained by a polygon):
-- Polyfill: get H3 cells contained by a polygon
SELECT h3_polyfillash3(
st_geogfromgeojson('{"type":"Polygon","coordinates":[[[-74.02,40.70],[-73.98,40.70],[-73.98,40.74],[-74.02,40.74],[-74.02,40.70]]]}'),
7
) AS cells;
Neighbors (k-ring, hex ring):
-- K-ring: cells within distance k of a cell
SELECT h3_kring(h3_longlatash3(-73.99, 40.71, 7), 1) AS ring;
SELECT h3_hexring(h3_longlatash3(-73.99, 40.71, 7), 1) AS hex_ring;
Cell boundary and center (WKT/GeoJSON):
-- Get H3 cell boundary as WKT/GeoJSON
SELECT h3_boundaryaswkt(h3_longlatash3(-73.99, 40.71, 7)) AS boundary_wkt;
SELECT h3_centerasgeojson(h3_longlatash3(-73.99, 40.71, 7)) AS center_geojson;
H3 expressions require Photon-enabled clusters or Databricks SQL (Pro/Serverless). The H3 library is included in Databricks Runtime 11.2+.
Combining with GeoBrix
- Readers: Use GeoBrix readers (GDAL, GeoJSON, Shapefile, etc.) to load vector/raster data; then use
st_geomfromwkb(geom_0)or similar to get DatabricksGEOMETRY/GEOGRAPHYfor ST or H3. - GridX (BNG): For British National Grid, use GridX BNG functions (e.g.
bng_polyfill,bng_tessellate). For a global hex grid, use Databricks H3. - RasterX: Use RasterX for raster I/O and operations; combine with ST/H3 for zonal stats or grid aggregation in SQL.