Spatial functions

flatten_polygons

flatten_polygons(col)

Explodes a MultiPolygon geometry into one row per constituent Polygon.

Parameters:

col (Column) – MultiPolygon Geometry

Return type:

Column: StringType

Example:

df = spark.createDataFrame([
        {'wkt': 'MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))'}
    ])
df.select(flatten_polygons('wkt')).show(2, False)
+------------------------------------------+
|element                                   |
+------------------------------------------+
|POLYGON ((30 20, 45 40, 10 40, 30 20))    |
|POLYGON ((15 5, 40 10, 10 20, 5 10, 15 5))|
+------------------------------------------+

st_area

st_area(col)

Compute the area of a geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_area('wkt')).show()
+------------+
|st_area(wkt)|
+------------+
|       550.0|
+------------+

Note

Results of this function are always expressed in the original units of the input geometry.

st_buffer

st_buffer(col, radius)

Buffer the input geometry by radius radius and return a new, buffered geometry.

Parameters:
  • col (Column) – Geometry

  • radius (Column (DoubleType)) – Double

Return type:

Column: Geometry

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_buffer('wkt', lit(2.))).show()
+--------------------+
| st_buffer(wkt, 2.0)|
+--------------------+
|POLYGON ((29.1055...|
+--------------------+

st_bufferloop

st_bufferloop(col, innerRadius, outerRadius)

Returns a difference between st_buffer(col, outerRadius) and st_buffer(col, innerRadius). The resulting geometry is a loop with a width of outerRadius - innerRadius.

Parameters:
  • col (Column) – Geometry

  • innerRadius (Column (DoubleType)) – Radius of the resulting geometry hole.

  • outerRadius (Column (DoubleType)) – Radius of the resulting geometry.

Return type:

Column: Geometry

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_bufferloop('wkt', lit(2.), lit(2.1)).show()
+-------------------------+
| st_buffer(wkt, 2.0, 2.1)|
+-------------------------+
|     POLYGON ((29.1055...|
+-------------------------+
../_images/geom.png

Fig 1. ST_BufferLoop(geom, 0.02, 0.04)

st_centroid2D [Deprecated]

st_centroid2D(col)

Returns the x and y coordinates representing the centroid of the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: StructType[x: DoubleType, y: DoubleType]

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_centroid2D('wkt')).show()
+---------------------------------------+
|st_centroid(wkt)                       |
+---------------------------------------+
|{25.454545454545453, 26.96969696969697}|
+---------------------------------------+

st_centroid

st_centroid(col)

Returns the POINT geometry representing the centroid of the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: Geometry

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_centroid('wkt')).show()
+---------------------------------------------+
|st_centroid(wkt)                             |
+---------------------------------------------+
|POINT (25.454545454545453, 26.96969696969697)|
+---------------------------------------------+

st_convexhull

st_convexhull(col)

Compute the convex hull of a geometry or multi-geometry object.

Parameters:

col (Column) – Geometry

Return type:

Column

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'}])
df.select(st_convexhull('wkt')).show(1, False)
+---------------------------------------------+
|st_convexhull(wkt)                           |
+---------------------------------------------+
|POLYGON ((10 40, 20 20, 30 10, 40 30, 10 40))|
+---------------------------------------------+

st_difference

st_difference(left_geom, right_geom)

Returns the point set difference of the left and right geometry.

Parameters:
  • left_geom (Column) – Geometry

  • right_geom (Column) – Geometry

Rtype Column:

Geometry

Example:

df = spark.createDataFrame([{'left': 'POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))', 'right': 'POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))'}])
df.select(st_difference(col('left'), col('right'))).show()
+-----------------------------------------------------------+
| st_difference(left, right)                                |
+-----------------------------------------------------------+
|POLYGON ((10 10, 20 10, 20 15, 15 15, 15 20, 10 20, 10 10))|
+-----------------------------------------------------------+

st_dimension

st_dimension(col)

Compute the dimension of the geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: IntegerType

Example:

>>> df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
>>> df.select(st_dimension('wkt')).show()
+-----------------+
|st_dimension(wkt)|
+-----------------+
|                2|
+-----------------+

st_distance

st_distance(geom1, geom2)

Compute the euclidean distance between geom1 and geom2.

Parameters:
  • geom1 (Column) – Geometry

  • geom2 (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'point': 'POINT (5 5)', 'poly': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_distance('poly', 'point')).show()
+------------------------+
|st_distance(poly, point)|
+------------------------+
|      15.652475842498529|
+------------------------+

Note

Results of this euclidean distance function are always expressed in the original units of the input geometries, e.g. for WGS84 (SRID 4326) units are degrees.

st_dump

st_dump(col)

Explodes a multi-geometry into one row per constituent geometry.

Parameters:

col (Column) – The input multi-geometry

Return type:

Column

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'}])
df.select(st_dump('wkt')).show(5, False)
+-------------+
|element      |
+-------------+
|POINT (10 40)|
|POINT (40 30)|
|POINT (20 20)|
|POINT (30 10)|
+-------------+

st_envelope

st_envelope(col)

Returns the minimum bounding box of the input geometry, as a geometry. This bounding box is defined by the rectangular polygon with corner points (x_min, y_min), (x_max, y_min), (x_min, y_max), (x_max, y_max).

Parameters:

col (Column) – Geometry

Return type:

Column

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((10 10, 20 10, 15 20, 10 10))'}])
df.select(st_envelope('wkt')).show()
+-----------------------------------------------+
| st_envelope(wkt)                              |
+-----------------------------------------------+
| POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10)) |
+-----------------------------------------------+

st_geometrytype

st_geometrytype(col)

Returns the type of the input geometry (“POINT”, “LINESTRING”, “POLYGON” etc.).

Parameters:

col (Column) – Geometry

Return type:

Column: StringType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_geometrytype('wkt')).show()
+--------------------+
|st_geometrytype(wkt)|
+--------------------+
|             POLYGON|
+--------------------+

st_haversine

st_haversine(lat1, lng1, lat2, lng2)

Compute the haversine distance between lat1/lng1 and lat2/lng2.

Parameters:
  • lat1 (Column) – DoubleType

  • lng1 (Column) – DoubleType

  • lat2 (Column) – DoubleType

  • lng2 (Column) – DoubleType

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'lat1': 0.0, 'lng1': 90.0, 'lat2': 0.0, 'lng2': 0.0}])
df.select(st_distance('lat1', 'lng1', 'lat2', 'lng2')).show()
+------------------------------------+
|st_haversine(lat1, lng1, lat2, lng2)|
+------------------------------------+
|                   10007.55722101796|
+------------------------------------+

Note

Results of this function are always expressed in km, while the input lat/lng pairs are expected to be in degrees. The radius used (in km) is 6371.0088.

st_hasvalidcoordinates

st_hasvalidcoordinates(geom, crs, which)

Checks if all points in geom are valid with respect to crs bounds. CRS bounds can be provided either as bounds or as reprojected_bounds.

Parameters:
  • geom (Column) – Geometry

  • crs (Column) – CRS name (EPSG ID), e.g. “EPSG:2192”

  • which (Column) – Check against geographic “bounds” or geometric “reprojected_bounds” bounds.

Return type:

Column: IntegerType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))'}])
df.select(st_hasvalidcoordinates(col('wkt'), lit('EPSG:2192'), lit('bounds'))).show()
+----------------------------------------------+
|st_hasvalidcoordinates(wkt, EPSG:2192, bounds)|
+----------------------------------------------+
|                                          true|
+----------------------------------------------+

st_intersection

st_intersection(geom1, geom2)

Returns a geometry representing the intersection of left_geom and right_geom.

Parameters:
  • geom1 (Column) – Geometry

  • geom2 (Column) – Geometry

Return type:

Column

Example:

df = spark.createDataFrame([{'p1': 'POLYGON ((0 0, 0 3, 3 3, 3 0))', 'p2': 'POLYGON ((2 2, 2 4, 4 4, 4 2))'}])
df.select(st_intersection(col('p1'), col('p2'))).show(1, False)
+-----------------------------------+
|st_intersection(p1, p2)            |
+-----------------------------------+
|POLYGON ((2 2, 3 2, 3 3, 2 3, 2 2))|
+-----------------------------------+

st_isvalid

st_isvalid(col)

Returns true if the geometry is valid.

Parameters:

col (Column) – Geometry

Return type:

Column: BooleanType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_isvalid('wkt')).show()
+---------------+
|st_isvalid(wkt)|
+---------------+
|           true|
+---------------+

df = spark.createDataFrame([{
    'wkt': 'POLYGON((0 0, 10 0, 10 10, 0 10, 0 0), (15 15, 15 20, 20 20, 20 15, 15 15))'
    }])
df.select(st_isvalid('wkt')).show()
+---------------+
|st_isvalid(wkt)|
+---------------+
|          false|
+---------------+

Note

Validity assertions will be dependent on the chosen geometry API. The assertions used in the ESRI geometry API (JTS is the default) follow the definitions in the “Simple feature access - Part 1” document (OGC 06-103r4) for each geometry type.

st_length

st_length(col)

Compute the length of a geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_length('wkt')).show()
+-----------------+
|   st_length(wkt)|
+-----------------+
|96.34413615167959|
+-----------------+

Note

Results of this function are always expressed in the original units of the input geometry.

Note

Alias for st_perimeter.

st_numpoints

st_numpoints(col)

Returns the number of points in geom.

Parameters:

col (Column) – Geometry

Return type:

Column: IntegerType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_numpoints('wkt')).show()
+-----------------+
|st_numpoints(wkt)|
+-----------------+
|                5|
+-----------------+

st_perimeter

st_perimeter(col)

Compute the perimeter length of a geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_perimeter('wkt')).show()
+-----------------+
|st_perimeter(wkt)|
+-----------------+
|96.34413615167959|
+-----------------+

Note

Results of this function are always expressed in the original units of the input geometry.

Note

Alias for st_length.

st_rotate

st_rotate(geom, td)

Rotates geom using the rotational factor td.

Parameters:
  • geom (Column) – Geometry

  • td (Column (DoubleType)) – Rotation (in radians)

Return type:

Column

Example:

from math import pi
df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_rotate('wkt', lit(pi))).show(1, False)
+-------------------------------------------------------+
|st_rotate(wkt, 3.141592653589793)                      |
+-------------------------------------------------------+
|POLYGON ((-30 -10, -40 -40, -20 -40, -10 -20, -30 -10))|
+-------------------------------------------------------+

st_scale

st_scale(geom, xd, yd)

Scales geom using the scaling factors xd and yd.

Parameters:
  • geom (Column) – Geometry

  • xd (Column (DoubleType)) – Scale factor in the x-direction

  • yd (Column (DoubleType)) – Scale factor in the y-direction

Return type:

Column

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_scale('wkt', lit(0.5), lit(2))).show(1, False)
+--------------------------------------------+
|st_scale(wkt, 0.5, 2)                       |
+--------------------------------------------+
|POLYGON ((15 20, 20 80, 10 80, 5 40, 15 20))|
+--------------------------------------------+

st_setsrid

st_setsrid(geom, srid)

Sets the Coordinate Reference System well-known identifier (SRID) for geom.

Parameters:
  • geom (Column) – Geometry

  • srid (Column (IntegerType)) – The spatial reference identifier of geom, expressed as an integer, e.g. 4326 for EPSG:4326 / WGS84

Return type:

Column

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'}])
df.select(st_setsrid(st_geomfromwkt('wkt'), lit(4326))).show(1)
+---------------------------------+
|st_setsrid(convert_to(wkt), 4326)|
+---------------------------------+
|             {2, 4326, [[[10.0...|
+---------------------------------+

Note

ST_SetSRID does not transform the coordinates of geom, rather it tells Mosaic the SRID in which the current coordinates are expressed. ST_SetSRID can only operate on geometries encoded in GeoJSON or the Mosaic internal format.

st_simplify

st_simplify(geom, tol)

Returns the simplified geometry.

Parameters:
  • geom (Column) – Geometry

  • tol (Column) – Tolerance

Return type:

Column: Geometry

Example:

df = spark.createDataFrame([{'wkt': 'LINESTRING (0 1, 1 2, 2 1, 3 0)'}])
df.select(st_simplify('wkt', 1.0)).show()
+----------------------------+
| st_simplify(wkt, 1.0)      |
+----------------------------+
| LINESTRING (0 1, 1 2, 3 0) |
+----------------------------+

Note

The specified tolerance will be ignored by the ESRI geometry API.

st_srid

st_srid(geom)

Looks up the Coordinate Reference System well-known identifier (SRID) for geom.

Parameters:

geom (Column) – Geometry

Return type:

Column

Example:

json_geom = '{"type":"MultiPoint","coordinates":[[10,40],[40,30],[20,20],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}'
df = spark.createDataFrame([{'json': json_geom}])
df.select(st_srid(as_json('json'))).show(1)
+----------------------+
|st_srid(as_json(json))|
+----------------------+
|                  4326|
+----------------------+

Note

ST_SRID can only operate on geometries encoded in GeoJSON or the Mosaic internal format.

st_transform

st_transform(geom, srid)

Transforms the horizontal (XY) coordinates of geom from the current reference system to that described by srid.

Parameters:
  • geom (Column) – Geometry

  • srid (Column (IntegerType)) – Target spatial reference system for geom, expressed as an integer, e.g. 3857 for EPSG:3857 / Pseudo-Mercator

Return type:

Column

Example:

df = (
  spark.createDataFrame([{'wkt': 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'}])
  .withColumn('geom', st_setsrid(st_geomfromwkt('wkt'), lit(4326)))
)
df.select(st_astext(st_transform('geom', lit(3857)))).show(1, False)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|convert_to(st_transform(geom, 3857))                                                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|MULTIPOINT ((1113194.9079327357 4865942.279503176), (4452779.631730943 3503549.843504374), (2226389.8158654715 2273030.926987689), (3339584.723798207 1118889.9748579597))|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Note

If geom does not have an associated SRID, use ST_SetSRID to set this before calling ST_Transform.

st_translate

st_translate(geom, xd, yd)

Translates geom to a new location using the distance parameters xd and yd.

Parameters:
  • geom (Column) – Geometry

  • xd (Column (DoubleType)) – Offset in the x-direction

  • yd (Column (DoubleType)) – Offset in the y-direction

Return type:

Column

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'}])
df.select(st_translate('wkt', lit(10), lit(-5))).show(1, False)
+----------------------------------------------+
|st_translate(wkt, 10, -5)                     |
+----------------------------------------------+
|MULTIPOINT ((20 35), (50 25), (30 15), (40 5))|
+----------------------------------------------+

st_union

st_union(left_geom, right_geom)

Returns the point set union of the input geometries.

Parameters:
  • left_geom (Column) – Geometry

  • right_geom (Column) – Geometry

Return type:

Column: Geometry

Example:

df = spark.createDataFrame([{'left': 'POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))', 'right': 'POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))'}])
df.select(st_union(col('left'), col('right'))).show()
+-------------------------------------------------------------------------+
| st_union(left, right)                                                   |
+-------------------------------------------------------------------------+
|POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))|
+-------------------------------------------------------------------------+

st_unaryunion

st_unaryunion(col)

Returns a geometry that represents the point set union of the given geometry

Parameters:

col (Column) – Geometry

Return type:

Column: Geometry

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))'}])
df.select(st_unaryunion('wkt')).show()
+-------------------------------------------------------------------------+
| st_unaryunion(wkt, 2.0)                                                 |
+-------------------------------------------------------------------------+
|POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))|
+-------------------------------------------------------------------------+

st_x

st_x(col)

Returns the x coordinate of the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POINT (30 10)'}])
df.select(st_x('wkt')).show()
+-----------------+
|st_x(wkt)        |
+-----------------+
|             30.0|
+-----------------+

st_xmax

st_xmax(col)

Returns the largest x coordinate in the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_xmax('wkt')).show()
+-----------------+
|st_minmaxxyz(wkt)|
+-----------------+
|             40.0|
+-----------------+

st_xmin

st_xmin(col)

Returns the smallest x coordinate in the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_xmin('wkt')).show()
+-----------------+
|st_minmaxxyz(wkt)|
+-----------------+
|             10.0|
+-----------------+

st_y

st_y(col)

Returns the y coordinate of the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POINT (30 10)'}])
df.select(st_y('wkt')).show()
+-----------------+
|st_y(wkt)        |
+-----------------+
|             10.0|
+-----------------+

st_ymax

st_ymax(col)

Returns the largest y coordinate in the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_ymax('wkt')).show()
+-----------------+
|st_minmaxxyz(wkt)|
+-----------------+
|             40.0|
+-----------------+

st_ymin

st_ymin(col)

Returns the smallest y coordinate in the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
df.select(st_ymin('wkt')).show()
+-----------------+
|st_minmaxxyz(wkt)|
+-----------------+
|             10.0|
+-----------------+

st_zmax

st_zmax(col)

Returns the largest z coordinate in the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POINT (30 10 20)'}])
df.select(st_zmax('wkt')).show()
+-----------------+
|st_minmaxxyz(wkt)|
+-----------------+
|             20.0|
+-----------------+

st_zmin

st_zmin(col)

Returns the smallest z coordinate in the input geometry.

Parameters:

col (Column) – Geometry

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'wkt': 'POINT (30 10 20)'}])
df.select(st_zmin('wkt')).show()
+-----------------+
|st_minmaxxyz(wkt)|
+-----------------+
|             20.0|
+-----------------+