Spatial grid indexing

Spatial grid indexing is the process of mapping a geometry (or a point) to one or more cells (or cell ID) from the selected spatial grid.

The grid system can be specified by using the spark configuration spark.databricks.labs.mosaic.index.system before enabling Mosaic.

The valid values are:

  • H3 - Good all-rounder for any location on earth

  • BNG - Local grid system Great Britain (EPSG:27700)

  • CUSTOM(minX,maxX,minY,maxY,splits,rootCellSizeX,rootCellSizeY) - Can be used with any local or global CRS
    • minX, maxX, minY, maxY can be positive or negative integers defining the grid bounds

    • splits defines how many splits are applied to each cell for an increase in resolution step (usually 2 or 10)

    • rootCellSizeX, rootCellSizeY define the size of the cells on resolution 0

Example

spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3") # Default
# spark.conf.set("spark.databricks.labs.mosaic.index.system", "BNG")
# spark.conf.set("spark.databricks.labs.mosaic.index.system", "CUSTOM(-180,180,-90,90,2,30,30)")

import mosaic as mos
mos.enable_mosaic(spark, dbutils)

grid_longlatascellid

grid_longlatascellid(lon, lat, resolution)

Returns the resolution grid index associated with the input lon and lat coordinates.

Parameters:
  • lon (Column: DoubleType) – Longitude

  • lat (Column: DoubleType) – Latitude

  • resolution (Column: Integer) – Index resolution

Return type:

Column: LongType

Example:

df = spark.createDataFrame([{'lon': 30., 'lat': 10.}])
df.select(grid_longlatascellid('lon', 'lat', lit(10))).show(1, False)
+----------------------------------+
|grid_longlatascellid(lon, lat, 10)|
+----------------------------------+
|                623385352048508927|
+----------------------------------+
../_images/h31.png

Fig 1. Point to grid cell in H3(9)

../_images/bng.png

Fig 2. Point to grid cell in BNG(4)

grid_pointascellid

grid_pointascellid(geometry, resolution)

Returns the resolution grid index associated with the input point geometry geometry.

Parameters:
  • geometry (Column) – Geometry

  • resolution (Column: Integer) – Index resolution

Return type:

Column: LongType

Example:

df = spark.createDataFrame([{'lon': 30., 'lat': 10.}])
df.select(grid_pointascellid(st_point('lon', 'lat'), lit(10))).show(1, False)
+------------------------------------------+
|grid_pointascellid(st_point(lon, lat), 10)|
+------------------------------------------+
|623385352048508927                        |
+------------------------------------------+
../_images/h31.png

Fig 1. Point to grid cell in H3(9)

../_images/bng.png

Fig 2. Point to grid cell in BNG(4)

grid_polyfill

grid_polyfill(geometry, resolution)

Returns the set of grid indices of which centroid is contained in the input geometry at resolution.

When using H3 index system, this is equivalent to the H3 polyfill method

Parameters:
  • geometry (Column) – Geometry

  • resolution (Column: Integer) – Index resolution

Return type:

Column: ArrayType[LongType]

Example:

df = spark.createDataFrame([{
    'wkt': 'MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))'
    }])
df.select(grid_polyfill('wkt', lit(0))).show(1, False)
+------------------------------------------------------------+
|grid_polyfill(wkt, 0)                                       |
+------------------------------------------------------------+
|[577586652210266111, 578360708396220415, 577269992861466623]|
+------------------------------------------------------------+
../_images/h32.png

Fig 1. Polyfill of a polygon in H3(8)

../_images/bng1.png

Fig 2. Polyfill of a polygon in BNG(4)

grid_boundaryaswkb

grid_boundaryaswkb(cellid)

Returns the boundary of the grid cell as a WKB.

Parameters:

cellid (Column: Union(LongType, StringType)) – Grid cell id

Example:

df = spark.createDataFrame([{'cellid': 613177664827555839}])
df.select(grid_boundaryaswkb("cellid").show(1, False)
+--------------------------+
|grid_boundaryaswkb(cellid)|
+--------------------------+
|[01 03 00 00 00 00 00 00..|
+--------------------------+

grid_boundary

grid_boundary(cellid, format)

Returns the boundary of the grid cell as a geometry in specified format.

Parameters:
  • cellid (Column: Union(LongType, StringType)) – Grid cell id

  • format (Column: StringType) – Geometry format

Example:

df = spark.createDataFrame([{'cellid': 613177664827555839}])
df.select(grid_boundary("cellid", "WKT").show(1, False)
+--------------------------+
|grid_boundary(cellid, WKT)|
+--------------------------+
|          "POLYGON (( ..."|
+--------------------------+

grid_tessellate

grid_tessellate(geometry, resolution, <keep_core_geometries>)

Cuts the original geometry into several pieces along the grid index borders at the specified resolution.

Returns an array of Mosaic chips covering the input geometry at resolution.

A Mosaic chip is a struct type composed of:

  • is_core: Identifies if the chip is fully contained within the geometry: Boolean

  • index_id: Index ID of the configured spatial indexing (default H3): Integer

  • wkb: Geometry in WKB format equal to the intersection of the index shape and the original geometry: Binary

In contrast to grid_tessellateexplode, grid_tessellate does not explode the list of shapes.

In contrast to grid_polyfill, grid_tessellate fully covers the original geometry even if the index centroid falls outside of the original geometry. This makes it suitable to index lines as well.

Parameters:
  • geometry (Column) – Geometry

  • resolution (Column (IntegerType)) – Index resolution

  • keep_core_geometries (Column (BooleanType)) – Whether to keep the core geometries or set them to null, default true

Return type:

Column: ArrayType[MosaicType]

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))'}])
df.select(grid_tessellate('wkt', lit(0))).printSchema()
root
 |-- grid_tessellate(wkt, 0): mosaic (nullable = true)
 |    |-- chips: array (nullable = true)
 |    |    |-- element: mosaic_chip (containsNull = true)
 |    |    |    |-- is_core: boolean (nullable = true)
 |    |    |    |-- index_id: long (nullable = true)
 |    |    |    |-- wkb: binary (nullable = true)


df.select(grid_tessellate('wkt', lit(0))).show()
+-----------------------+
|grid_tessellate(wkt, 0)|
+-----------------------+
|   {[{false, 5774810...|
+-----------------------+
../_images/h33.png

Fig 1. Tessellation of a polygon in H3(8)

../_images/bng2.png

Fig 2. Tessellation of a polygon in BNG(4)

grid_tessellateexplode

grid_tessellateexplode(geometry, resolution, <keep_core_geometries>)

Cuts the original geometry into several pieces along the grid index borders at the specified resolution.

Returns the set of Mosaic chips covering the input geometry at resolution.

A Mosaic chip is a struct type composed of:

  • is_core: Identifies if the chip is fully contained within the geometry: Boolean

  • index_id: Index ID of the configured spatial indexing (default H3): Integer

  • wkb: Geometry in WKB format equal to the intersection of the index shape and the original geometry: Binary

In contrast to grid_tessellate, grid_tessellateexplode generates one result row per chip.

In contrast to grid_polyfill, grid_tessellateexplode fully covers the original geometry even if the index centroid falls outside of the original geometry. This makes it suitable to index lines as well.

Parameters:
  • geometry (Column) – Geometry

  • resolution (Column (IntegerType)) – Index resolution

  • keep_core_geometries (Column (BooleanType)) – Whether to keep the core geometries or set them to null, default true

Return type:

Column: MosaicType

Example:

df = spark.createDataFrame([{'wkt': 'MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))'}])
df.select(grid_tessellateexplode('wkt', lit(0))).show()
+-----------------------------------------------+
|is_core|          index_id|                 wkb|
+-------+------------------+--------------------+
|  false|577481099093999615|[01 03 00 00 00 0...|
|  false|578044049047420927|[01 03 00 00 00 0...|
|  false|578782920861286399|[01 03 00 00 00 0...|
|  false|577023702256844799|[01 03 00 00 00 0...|
|  false|577938495931154431|[01 03 00 00 00 0...|
|  false|577586652210266111|[01 06 00 00 00 0...|
|  false|577269992861466623|[01 03 00 00 00 0...|
|  false|578360708396220415|[01 03 00 00 00 0...|
+-------+------------------+--------------------+
../_images/h33.png

Fig 1. Tessellation of a polygon in H3(8)

../_images/bng2.png

Fig 2. Tessellation of a polygon in BNG(4)

grid_cellarea

grid_cellarea(cellid)

Returns the area of a given cell in km^2.

Parameters:

cellid (Column: Long) – Grid cell ID

Return type:

Column: DoubleType

Example:

df = spark.createDataFrame([{'grid_cellid': 613177664827555839}])
df.withColumn(grid_cellarea('grid_cellid').alias("area")).show()
+------------------------------------+
|         grid_cellid|           area|
+--------------------+---------------+
|  613177664827555839|     0.78595419|
+--------------------+---------------+

grid_cellkring

grid_cellkring(cellid, k)

Returns the k-ring of a given cell.

Parameters:
  • cellid (Column: Long) – Grid cell ID

  • k (Column: Integer) – K-ring size

Return type:

Column: ArrayType(Long)

Example:

df = spark.createDataFrame([{'grid_cellid': 613177664827555839}])
df.select(grid_cellkring('grid_cellid', lit(2)).alias("kring")).show()
+-------------------------------------------------------------------+
|         grid_cellid|                                         kring|
+--------------------+----------------------------------------------+
|  613177664827555839|[613177664827555839, 613177664825458687, ....]|
+--------------------+----------------------------------------------+
../_images/h34.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng3.png

Fig 2. Cell based kring(2) in BNG(4)

grid_cellkringexplode

grid_cellkringexplode(cellid, k)

Returns the k-ring of a given cell exploded.

Parameters:
  • cellid (Column: Long) – Grid cell ID

  • k (Column: Integer) – K-ring size

Return type:

Column: Long

Example:

df = spark.createDataFrame([{'grid_cellid': 613177664827555839}])
df.select(grid_cellkringexplode('grid_cellid', lit(2)).alias("kring")).show()
+------------------+
|             kring|
+------------------+
|613177664827555839|
|613177664825458687|
|613177664831750143|
|613177664884178943|
|               ...|
+------------------+
../_images/h34.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng3.png

Fig 2. Cell based kring(2) in BNG(4)

grid_cell_intersection

grid_cell_intersection(left_chip, right_chip)

Returns the chip representing the intersection of two chips based on the same grid cell. Also, see grid_cell_intersection_agg function.

Parameters:
  • left_chip (Column: ChipType(LongType)) – Chip

  • left_chip – Chip

Return type:

Column: ChipType(LongType)

Example:

df = spark.createDataFrame([{"chip": {"is_core": False, "index_id": 590418571381702655, "wkb": ...}})])
df.select(grid_cell_intersection("chip", "chip").alias("intersection")).show()
---------------------------------------------------------+
|                                           intersection |
+--------------------------------------------------------+
|{is_core: false, index_id: 590418571381702655, wkb: ...}|
+--------------------------------------------------------+

grid_cell_union

grid_cell_union(left_chip, right_chip)

Returns the chip representing the union of two chips based on the same grid cell. Also, see grid_cell_union_agg function.

Parameters:
  • left_chip (Column: ChipType(LongType)) – Chip

  • left_chip – Chip

Return type:

Column: ChipType(LongType)

Example:

df = spark.createDataFrame([{"chip": {"is_core": False, "index_id": 590418571381702655, "wkb": ...}})])
df.select(grid_cell_union("chip", "chip").alias("union")).show()
---------------------------------------------------------+
|                                           union        |
+--------------------------------------------------------+
|{is_core: false, index_id: 590418571381702655, wkb: ...}|
+--------------------------------------------------------+

grid_cellkloop

grid_cellkloop(cellid, k)

Returns the k loop (hollow ring) of a given cell.

Parameters:
  • cellid (Column: Long) – Grid cell ID

  • k (Column: Integer) – K-loop size

Return type:

Column: ArrayType(Long)

Example:

df = spark.createDataFrame([{'grid_cellid': 613177664827555839}])
df.select(grid_cellkloop('grid_cellid', lit(2)).alias("kloop")).show()
+-------------------------------------------------------------------+
|         grid_cellid|                                         kloop|
+--------------------+----------------------------------------------+
|  613177664827555839|[613177664827555839, 613177664825458687, ....]|
+--------------------+----------------------------------------------+
../_images/h35.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng4.png

Fig 2. Cell based kring(2) in BNG(4)

grid_cellkloopexplode

grid_cellkloopexplode(cellid, k)

Returns the k loop (hollow ring) of a given cell exploded.

Parameters:
  • cellid (Column: Long) – Grid cell ID

  • k (Column: Integer) – K-loop size

Return type:

Column: Long

Example:

df = spark.createDataFrame([{'grid_cellid': 613177664827555839}])
df.select(grid_cellkloopexplode('grid_cellid', lit(2)).alias("kloop")).show()
+------------------+
|             kloop|
+------------------+
|613177664827555839|
|613177664825458687|
|613177664831750143|
|613177664884178943|
|               ...|
+------------------+
../_images/h35.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng4.png

Fig 2. Cell based kring(2) in BNG(4)

grid_geometrykring

grid_geometrykring(geometry, resolution, k)

Returns the k-ring of a given geometry respecting the boundary shape.

Parameters:
  • geometry (Column) – Geometry to be used

  • resolution (Column: Integer) – Resolution of the index used to calculate the k-ring

  • k (Column: Integer) – K-ring size

Return type:

Column: ArrayType(Long)

Example:

df = spark.createDataFrame([{'geometry': "MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))"}])
df.select(grid_geometrykring('geometry', lit(8), lit(1)).alias("kring")).show()
+-------------------------------------------------------------------+
|            geometry|                                         kring|
+--------------------+----------------------------------------------+
|  "MULTIPOLYGON(..."|[613177664827555839, 613177664825458687, ....]|
+--------------------+----------------------------------------------+
../_images/h36.png

Fig 1. Geometry based kring(1) in H3(8)

../_images/bng5.png

Fig 2. Geometry based kring(1) in BNG(4)

grid_geometrykringexplode

grid_geometrykringexplode(geometry, resolution, k)

Returns the k-ring of a given geometry exploded.

Parameters:
  • geometry (Column) – Geometry to be used

  • resolution (Column: Integer) – Resolution of the index used to calculate the k-ring

  • k (Column: Integer) – K-ring size

Return type:

Column: Long

Example:

df = spark.createDataFrame([{'geometry': "MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))"}])
df.select(grid_geometrykringexplode('geometry', lit(8), lit(2)).alias("kring")).show()
+------------------+
|             kring|
+------------------+
|613177664827555839|
|613177664825458687|
|613177664831750143|
|613177664884178943|
|               ...|
+------------------+
../_images/h36.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng5.png

Fig 2. Cell based kring(2) in BNG(4)

grid_geometrykloop

grid_geometrykloop(geometry, resolution, k)

Returns the k-loop (hollow ring) of a given geometry.

Parameters:
  • geometry (Column) – Geometry to be used

  • resolution (Column: Integer) – Resolution of the index used to calculate the k loop

  • k (Column: Integer) – K-Loop size

Return type:

Column: ArrayType(Long)

Example:

df = spark.createDataFrame([{'geometry': "MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))"}])
df.select(grid_geometrykloop('geometry', lit(2)).alias("kloop")).show()
+-------------------------------------------------------------------+
|            geometry|                                         kloop|
+--------------------+----------------------------------------------+
|  MULTIPOLYGON ((...|[613177664827555839, 613177664825458687, ....]|
+--------------------+----------------------------------------------+
../_images/h37.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng6.png

Fig 2. Cell based kring(2) in BNG(4)

grid_geometrykloopexplode

grid_geometrykloopexplode(geometry, resolution, k)

Returns the k loop (hollow ring) of a given geometry exploded.

Parameters:
  • geometry (Column) – Geometry to be used

  • resolution (Column: Integer) – Resolution of the index used to calculate the k loop

  • k (Column: Integer) – K-loop size

Return type:

Column: Long

Example:

df = spark.createDataFrame([{'geometry': "MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))"}])
df.select(grid_geometrykloopexplode('geometry', lit(8), lit(2)).alias("kloop")).show()
+------------------+
|             kloop|
+------------------+
|613177664827555839|
|613177664825458687|
|613177664831750143|
|613177664884178943|
|               ...|
+------------------+
../_images/h37.png

Fig 1. Cell based kring(2) in H3(8)

../_images/bng6.png

Fig 2. Cell based kring(2) in BNG(4)