Raster functions
Intro
Raster functions are available in mosaic if you have installed the optional dependency GDAL. Please see Install and Enable GDAL with Mosaic for installation instructions.
Mosaic provides several unique raster functions that are not available in other Spark packages. Mainly raster to grid functions, which are useful for reprojecting the raster data into a standard grid index system. This is useful for performing spatial joins between raster data and vector data.
Mosaic also provides a scalable retiling function that can be used to retile raster data in case of bottlenecking due to large files.
All raster functions respect the
rst_
prefix naming convention.
Tile objects
Mosaic raster functions perform operations on “raster tile” objects. These can be created explicitly using functions
such as rst_fromfile or rst_fromcontent or implicitly when using Mosaic’s GDAL datasource reader
e.g. spark.read.format("gdal")
- Important changes to tile objects
The Mosaic raster tile schema changed in v0.4.1 to the following:
<tile:struct<index_id:bigint, tile:binary, metadata:map<string, string>>
. All APIs that use tiles now follow this schema.Mosaic can write rasters from a DataFrame to a target directory in DBFS using the function rst_write
Note
For mosaic versions > 0.4.0 you can use the revamped setup_gdal function or new setup_fuse_install. These functions will configure an init script in your preferred Workspace, Volume, or DBFS location to install GDAL on your cluster. See Install and Enable GDAL with Mosaic for more details.
Note
For complex operations and / or working with large rasters, Mosaic offers the option option of employing checkpointing to write intermediate results to disk. Follow the instructions in Checkpointing to enable this feature.
Functions
rst_avg
- rst_avg(tile)
Returns an array containing mean values for each band.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: ArrayType(DoubleType)
- Example:
df.selectExpr(mos.rst_avg("tile")).limit(1).display()
+---------------+
| rst_avg(tile) |
+---------------+
| [42.0] |
+---------------+
df.select(rst_avg(col("tile"))).limit(1).show
+---------------+
| rst_avg(tile) |
+---------------+
| [42.0] |
+---------------+
SELECT rst_avg(tile) FROM table LIMIT 1
+---------------+
| rst_avg(tile) |
+---------------+
| [42.0] |
+---------------+
rst_bandmetadata
- rst_bandmetadata(tile, band)
Extract the metadata describing the raster band. Metadata is return as a map of key value pairs.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
band (Column (IntegerType)) – The band number to extract metadata for.
- Return type:
Column: MapType(StringType, StringType)
- Example:
df.select(mos.rst_bandmetadata("tile", F.lit(1))).limit(1).display()
+--------------------------------------------------------------------------------------+
| rst_bandmetadata(tile, 1) |
+--------------------------------------------------------------------------------------+
| {"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert |
| area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": |
| "bleaching_alert_area", "coverage_content_type": "thematicClassification", |
| "standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral |
| bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching |
| Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product |
| description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", |
| "valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} |
+--------------------------------------------------------------------------------------+
df.select(rst_bandmetadata(col("tile"), lit(1))).limit(1).show
+--------------------------------------------------------------------------------------+
| rst_bandmetadata(tile, 1) |
+--------------------------------------------------------------------------------------+
| {"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert |
| area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": |
| "bleaching_alert_area", "coverage_content_type": "thematicClassification", |
| "standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral |
| bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching |
| Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product |
| description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", |
| "valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} |
+--------------------------------------------------------------------------------------+
SELECT rst_bandmetadata(tile, 1) FROM table LIMIT 1
+--------------------------------------------------------------------------------------+
| rst_bandmetadata(tile, 1) |
+--------------------------------------------------------------------------------------+
| {"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert |
| area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": |
| "bleaching_alert_area", "coverage_content_type": "thematicClassification", |
| "standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral |
| bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching |
| Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product |
| description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", |
| "valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} |
+--------------------------------------------------------------------------------------+
rst_boundingbox
- rst_boundingbox(tile)
Returns the bounding box of the raster as a polygon geometry.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: StructType(DoubleType, DoubleType, DoubleType, DoubleType)
- Example:
df.select(mos.rst_boundingbox("tile")).limit(1).display()
+------------------------------------------------------------------+
| rst_boundingbox(tile) |
+------------------------------------------------------------------+
| [00 00 ... 00] // WKB representation of the polygon bounding box |
+------------------------------------------------------------------+
df.select(rst_boundingbox(col("tile"))).limit(1).show
+------------------------------------------------------------------+
| rst_boundingbox(tile) |
+------------------------------------------------------------------+
| [00 00 ... 00] // WKB representation of the polygon bounding box |
+------------------------------------------------------------------+
SELECT rst_boundingbox(tile) FROM table LIMIT 1
+------------------------------------------------------------------+
| rst_boundingbox(tile) |
+------------------------------------------------------------------+
| [00 00 ... 00] // WKB representation of the polygon bounding box |
+------------------------------------------------------------------+
rst_clip
- rst_clip(tile, geometry, cutline_all_touched)
Clips
tile
withgeometry
, provided in a supported encoding (WKB, WKT or GeoJSON).- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
geometry (Column (GeometryType)) – A column containing the geometry to clip the raster to.
cutline_all_touched (Column (BooleanType)) – A column to specify pixels boundary behavior.
- Return type:
Column: RasterTileType
Note
- Notes
- Geometry input
The
geometry
parameter is expected to be a polygon or a multipolygon.- Cutline handling
The
cutline_all_touched
parameter:Optional: default is true. This is a GDAL warp config for the operation.
If set to true, the pixels touching the geometry are included in the clip, regardless of half-in or half-out; this is important for tessellation behaviors.
If set to false, only at least half-in pixels are included in the clip.
More information can be found here
The actual GDAL command employed to perform the clipping operation is as follows:
"gdalwarp -wo CUTLINE_ALL_TOUCHED=<TRUE|FALSE> -cutline <GEOMETRY> -crop_to_cutline"
- Output
Output raster tiles will have:
the same extent as the input geometry.
the same number of bands as the input raster.
the same pixel data type as the input raster.
the same pixel size as the input raster.
the same coordinate reference system as the input raster.
- example:
df.select(mos.rst_clip("tile", F.lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df.select(rst_clip(col("tile"), lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) |
+-----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+-----------------------------------------------------------------------------------------------------------------+
SELECT rst_clip(tile, "POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))") FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_combineavg
- rst_combineavg(tiles)
Combines a collection of raster tiles by averaging the pixel values.
- Parameters:
tiles (Column (ArrayType(RasterTileType))) – A column containing an array of raster tiles.
- Return type:
Column: RasterTileType
Note
- Notes
Each tile in
tiles
must have the same extent, number of bands, pixel data type, pixel size and coordinate reference system.The output raster will have the same extent, number of bands, pixel data type, pixel size and coordinate reference system as the input tiles.
Also, see rst_combineavg_agg function.
- example:
df\
.select(F.array("tile1","tile2","tile3")).alias("tiles"))\
.select(mos.rst_combineavg("tiles")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavg(tiles) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df
.select(F.array("tile1","tile2","tile3")).as("tiles"))
.select(rst_combineavg(col("tiles"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavg(tiles) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_combineavg(array(tile1,tile2,tile3)) FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavg(array(tile1,tile2,tile3)) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_convolve
- rst_convolve(tile, kernel)
Applies a convolution filter to the raster. The result is Mosaic raster tile representing the filtered input
tile
.- Parameters:
tile (Column (RasterTileType)) – A column containing raster tile.
kernel (Column (ArrayType(ArrayType(DoubleType)))) – The kernel to apply to the raster.
- Return type:
Column: RasterTileType
Note
- Notes
The
kernel
can be Array of Array of either Double, Integer, or Decimal but will be cast to Double.This method assumes the kernel is square and has an odd number of rows and columns.
Kernel uses the configured GDAL
blockSize
with a stride beingkernelSize/2
.
- example:
df\
.withColumn("convolve_arr", array(
array(lit(1.0), lit(2.0), lit(3.0))
array(lit(3.0), lit(2.0), lit(1.0)),
array(lit(1.0), lit(3.0), lit(2.0)))\
.select(rst_convolve("tile", "convolve_arr").display()
+---------------------------------------------------------------------------+
| rst_convolve(tile,convolve_arr) |
+---------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)", |
| "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} |
+---------------------------------------------------------------------------+
df
.withColumn("convolve_arr", array(
array(lit(1.0), lit(2.0), lit(3.0)),
array(lit(3.0), lit(2.0), lit(1.0)),
array(lit(1.0), lit(3.0), lit(2.0)))
)
.select(rst_convolve(col("tile"), col("convolve_arr")).show
+---------------------------------------------------------------------------+
| rst_convolve(tile,convolve_arr) |
+---------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)", |
| "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} |
+---------------------------------------------------------------------------+
SELECT rst_convolve(tile, convolve_arr) FROM table LIMIT 1
+---------------------------------------------------------------------------+
| rst_convolve(tile,convolve_arr) |
+---------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)", |
| "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} |
+---------------------------------------------------------------------------+
For clarity, this is ultimately the execution of the kernel.
def convolveAt(x: Int, y: Int, kernel: Array[Array[Double]]): Double = { val kernelWidth = kernel.head.length val kernelHeight = kernel.length val kernelCenterX = kernelWidth / 2 val kernelCenterY = kernelHeight / 2 var sum = 0.0 for (i <- 0 until kernelHeight) { for (j <- 0 until kernelWidth) { val xIndex = x + (j - kernelCenterX) val yIndex = y + (i - kernelCenterY) if (xIndex >= 0 && xIndex < width && yIndex >= 0 && yIndex < height) { val maskValue = maskAt(xIndex, yIndex) val value = elementAt(xIndex, yIndex) if (maskValue != 0.0 && num.toDouble(value) != noDataValue) { sum += num.toDouble(value) * kernel(i)(j) } } } } sum }
rst_derivedband
- rst_derivedband(tiles, python_func, func_name)
Combine an array of raster tiles using provided python function.
- Parameters:
tiles (Column (ArrayType(RasterTileType))) – A column containing an array of raster tiles.
python_func (Column (StringType)) – A function to evaluate in python.
func_name (Column (StringType)) – name of the function to evaluate in python.
- Return type:
Column: RasterTileType
Note
- Notes
Input raster tiles in
tiles
must have the same extent, number of bands, pixel data type, pixel size and coordinate reference system.The output raster will have the same the same extent, number of bands, pixel data type, pixel size and coordinate reference system as the input raster tiles.
See also: rst_derivedband_agg function.
- example:
df\
.select(
F.array("tile1","tile2","tile3")).alias("tiles"),
F.lit(
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""").alias("py_func1"),
F.lit("average").alias("func1_name")
)\
.select(mos.rst_deriveband("tiles","py_func1","func1_name")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedband(tiles,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df
.select(
array("tile1","tile2","tile3")).alias("tiles"),
lit(
"""
|import numpy as np
|def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
| out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
|""".stripMargin).as("py_func1"),
lit("average").as("func1_name")
)
.select(mos.rst_deriveband("tiles","py_func1","func1_name")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedband(tiles,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT
rst_derivedband(array(tile1,tile2,tile3)) as tiles,
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""" as py_func1,
"average" as funct1_name
FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedband(tiles,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_dtmfromgeoms
- rst_dtmfromgeoms(pointsArray, linesArray, mergeTolerance, snapTolerance, splitPointFinder, origin, xWidth, yWidth, xSize, ySize, noData)
Generate a raster with interpolated elevations across a grid of points described by:
origin
: a point geometry describing the bottom-left corner of the grid,xWidth
andyWidth
: the number of points in the grid in x and y directions,xSize
andySize
: the space between grid points in the x and y directions.
- Note:
To generate a grid from a “top-left”
origin
, use a negative value forySize
.
The underlying algorithm first creates a surface mesh by triangulating
pointsArray
(includinglinesArray
as a set of constraint lines) then determines where each point in the grid would lie on the surface mesh. Finally, it interpolates the elevation of that point based on the surrounding triangle’s vertices.As with
st_triangulate
, there are two ‘tolerance’ parameters for the algorithm:mergeTolerance
sets the point merging tolerance of the triangulation algorithm, i.e. before the initial triangulation is performed, nearby points inpointsArray
can be merged in order to speed up the triangulation process. A value of zero means all points are considered for triangulation.snapTolerance
sets the tolerance for post-processing the results of the triangulation, i.e. matching the vertices of the output triangles to input points / lines. This is necessary as the algorithm often returns null height / Z values. Setting this to a large value may result in the incorrect Z values being assigned to the output triangle vertices (especially whenlinesArray
contains very densely spaced segments). Setting this value to zero may result in the output triangle vertices being assigned a null Z value.
Both tolerance parameters are expressed in the same units as the projection of the input point geometries.
Additionally, you have control over the algorithm used to find split points on the constraint lines. The recommended default option here is the “NONENCROACHING” algorithm. You can also use the “MIDPOINT” algorithm if you find the constraint fitting process fails to converge. For full details of these options see the JTS reference here.
The
noData
value of the output raster can be set using thenoData
parameter.This is a generator expression and the resulting DataFrame will contain one row per point of the grid.
- Parameters:
pointsArray (Column (ArrayType(Geometry))) – Array of geometries respresenting the points to be triangulated
linesArray (Column (ArrayType(Geometry))) – Array of geometries respresenting the lines to be used as constraints
mergeTolerance (Column (DoubleType)) – A tolerance used to coalesce points in close proximity to each other before performing triangulation.
snapTolerance (Column (DoubleType)) – A snapping tolerance used to relate created points to their corresponding lines for elevation interpolation.
splitPointFinder (Column (StringType)) – Algorithm used for finding split points on constraint lines. Options are “NONENCROACHING” and “MIDPOINT”.
origin (Column (Geometry)) – A point geometry describing the bottom-left corner of the grid.
xWidth (Column (IntegerType)) – The number of points in the grid in x direction.
yWidth (Column (IntegerType)) – The number of points in the grid in y direction.
xSize (Column (DoubleType)) – The spacing between each point on the grid’s x-axis.
ySize (Column (DoubleType)) – The spacing between each point on the grid’s y-axis.
noData (Column (DoubleType)) – The no-data value of the output raster.
- Return type:
Column (RasterTileType)
- Example:
df = (
spark.createDataFrame(
[
["POINT Z (2 1 0)"],
["POINT Z (3 2 1)"],
["POINT Z (1 3 3)"],
["POINT Z (0 2 2)"],
],
["wkt"],
)
.groupBy()
.agg(collect_list("wkt").alias("masspoints"))
.withColumn("breaklines", array(lit("LINESTRING EMPTY")))
.withColumn("origin", st_geomfromwkt(lit("POINT (0.6 1.8)")))
.withColumn("xWidth", lit(12))
.withColumn("yWidth", lit(6))
.withColumn("xSize", lit(0.1))
.withColumn("ySize", lit(0.1))
)
df.select(
rst_dtmfromgeoms(
"masspoints", "breaklines", lit(0.0), lit(0.01),
"origin", "xWidth", "yWidth", "xSize", "ySize",
split_point_finder="NONENCROACHING", no_data_value=-9999.0
)
).show(truncate=False)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|rst_dtmfromgeoms(masspoints, breaklines, 0.0, 0.01, origin, xWidth, yWidth, xSize, ySize) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{NULL, /dbfs/tmp/mosaic/raster/checkpoint/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, {path -> /dbfs/tmp/mosaic/raster/checkpoint/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, last_error -> , all_parents -> , driver -> GTiff, parentPath -> /tmp/mosaic_tmp/mosaic5678582907307109410/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, last_command -> gdal_rasterize ATTRIBUTE=VALUES -of GTiff -co TILED=YES -co COMPRESS=DEFLATE}}|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
val df = Seq(
Seq(
"POINT Z (2 1 0)", "POINT Z (3 2 1)",
"POINT Z (1 3 3)", "POINT Z (0 2 2)"
)
)
.toDF("masspoints")
.withColumn("breaklines", array().cast(ArrayType(StringType)))
.withColumn("origin", st_geomfromwkt(lit("POINT (0.6 1.8)")))
.withColumn("xWidth", lit(12))
.withColumn("yWidth", lit(6))
.withColumn("xSize", lit(0.1))
.withColumn("ySize", lit(0.1))
df.select(
rst_dtmfromgeoms(
$"masspoints", $"breaklines",
lit(0.0), lit(0.01), lit("NONENCROACHING")
$"origin", $"xWidth", $"yWidth",
$"xSize", $"ySize", lit(-9999.0)
)
).show(1, false)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|rst_dtmfromgeoms(masspoints, breaklines, 0.0, 0.01, origin, xWidth, yWidth, xSize, ySize) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{NULL, /dbfs/tmp/mosaic/raster/checkpoint/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, {path -> /dbfs/tmp/mosaic/raster/checkpoint/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, last_error -> , all_parents -> , driver -> GTiff, parentPath -> /tmp/mosaic_tmp/mosaic5678582907307109410/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, last_command -> gdal_rasterize ATTRIBUTE=VALUES -of GTiff -co TILED=YES -co COMPRESS=DEFLATE}}|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SELECT
RST_DTMFROMGEOMS(
ARRAY(
"POINT Z (2 1 0)",
"POINT Z (3 2 1)",
"POINT Z (1 3 3)",
"POINT Z (0 2 2)"
),
ARRAY("LINESTRING EMPTY"),
DOUBLE(0.0), DOUBLE(0.01), "NONENCROACHING",
"POINT (0.6 1.8)", 12, 6,
DOUBLE(0.1), DOUBLE(0.1), DOUBLE(-9999.0)
) AS tile
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|rst_dtmfromgeoms(masspoints, breaklines, 0.0, 0.01, origin, xWidth, yWidth, xSize, ySize) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{NULL, /dbfs/tmp/mosaic/raster/checkpoint/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, {path -> /dbfs/tmp/mosaic/raster/checkpoint/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, last_error -> , all_parents -> , driver -> GTiff, parentPath -> /tmp/mosaic_tmp/mosaic5678582907307109410/raster_d4ab419f_9829_4004_99a3_aaa597a69938.GTiff, last_command -> gdal_rasterize ATTRIBUTE=VALUES -of GTiff -co TILED=YES -co COMPRESS=DEFLATE}}|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
sdf <- createDataFrame(
data.frame(
points = c(
"POINT Z (3 2 1)", "POINT Z (2 1 0)",
"POINT Z (1 3 3)", "POINT Z (0 2 2)"
)
)
)
sdf <- agg(groupBy(sdf), masspoints = collect_list(column("points")))
sdf <- withColumn(sdf, "breaklines", expr("array('LINESTRING EMPTY')"))
sdf <- select(sdf, rst_dtmfromgeoms(
column("masspoints"), column("breaklines"),
lit(0.0), lit(0.01), lit("NONENCROACHING"),
lit("POINT (0.6 1.8)"), lit(12L), lit(6L),
lit(0.1), lit(0.1), lit(-9999.0)
)
)
showDF(sdf, n=1, truncate=F)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|rst_dtmfromgeoms(masspoints, breaklines, 0.0, 0.01, POINT (0.6 1.8), 12, 6, 0.1, 0.1) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{NULL, /dbfs/tmp/mosaic/raster/checkpoint/raster_ab03a97f_9bc3_410c_80e1_adf6f75f46e2.GTiff, {path -> /dbfs/tmp/mosaic/raster/checkpoint/raster_ab03a97f_9bc3_410c_80e1_adf6f75f46e2.GTiff, last_error -> , all_parents -> , driver -> GTiff, parentPath -> /tmp/mosaic_tmp/mosaic8840676907961488874/raster_ab03a97f_9bc3_410c_80e1_adf6f75f46e2.GTiff, last_command -> gdal_rasterize ATTRIBUTE=VALUES -of GTiff -co TILED=YES -co COMPRESS=DEFLATE}}|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
rst_filter
- rst_filter(tile, kernel_size, operation)
Applies a filter to the raster. Returns a new raster tile with the filter applied.
kernel_size
is the number of pixels to compare; it must be odd.operation
is the op to apply, e.g. ‘avg’, ‘median’, ‘mode’, ‘max’, ‘min’.- Parameters:
tile (Column (RasterTileType)) – Mosaic raster tile struct column.
kernel_size (Column (IntegerType)) – The size of the kernel. Has to be odd.
operation (Column (StringType)) – The operation to apply to the kernel.
- Return type:
Column (RasterTileType)
- Example:
df.select(rst_filter('tile', lit(3), lit("mode"))).limit(1).display()
+-----------------------------------------------------------------------------------------------------------------------------+
| rst_filter(tile,3,mode) |
+-----------------------------------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} |
+-----------------------------------------------------------------------------------------------------------------------------+
df.select(rst_filter(col("tile"), lit(3), lit("mode"))).limit(1).show
+-----------------------------------------------------------------------------------------------------------------------------+
| rst_filter(tile,3,mode) |
+-----------------------------------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} |
+-----------------------------------------------------------------------------------------------------------------------------+
SELECT rst_filter(tile,3,"mode") FROM table LIMIT 1
+-----------------------------------------------------------------------------------------------------------------------------+
| rst_filter(tile,3,mode) |
+-----------------------------------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} |
+-----------------------------------------------------------------------------------------------------------------------------+
rst_frombands
- rst_frombands(tiles)
Combines a collection of raster tiles of different bands into a single raster.
- Parameters:
tiles (Column (ArrayType(RasterTileType))) – A column containing an array of raster tiles.
- Return type:
Column: RasterTileType
Note
- Notes
All raster tiles must have the same extent.
The tiles must have the same pixel coordinate reference system.
The output tile will have the same extent as the input tiles.
The output tile will have the a number of bands equivalent to the number of input tiles.
The output tile will have the same pixel type as the input tiles.
The output tile will have the same pixel size as the highest resolution input tile.
The output tile will have the same coordinate reference system as the input tiles.
- example:
df.select(F.array("tile1", "tile2", "tile3").as("tiles"))\
.select(mos.rst_frombands("tiles")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_frombands(tiles) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df
.select(array("tile1", "tile2", "tile3").as("tiles"))
.select(rst_frombands(col("tiles"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_frombands(tiles) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_frombands(array(tile1,tile2,tile3)) FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_frombands(array(tile1,tile2,tile3)) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_fromcontent
- rst_fromcontent(raster_bin, driver, <size_in_MB>)
Returns a tile from raster data.
- Parameters:
raster_bin (Column (BinaryType)) – A column containing the raster data.
driver (Column(StringType)) – GDAL driver to use to open the raster.
size_in_MB (Column (IntegerType)) – Optional parameter to specify the size of the raster tile in MB. Default is not to split the input.
- Return type:
Column: RasterTileType
Note
- Notes
The input raster must be a byte array in a BinaryType column.
The driver required to read the raster must be one supplied with GDAL.
If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size.
If the size_in_MB parameter is not specified or if the size_in_Mb < 0, the raster will only be split if it exceeds Integer.MAX_VALUE. The split will be at a threshold of 64MB in this case.
- example:
# binary is python bytearray data type
df = spark.read.format("binaryFile")\
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")\
df.select(mos.rst_fromcontent("content")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_fromcontent(content) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
//binary is scala/java Array(Byte) data type
val df = spark.read
.format("binaryFile")
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")
df.select(rst_fromcontent(col("content"))).limit(1).show(false)
+----------------------------------------------------------------------------------------------------------------+
| rst_fromcontent(content) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
CREATE TABLE IF NOT EXISTS TABLE coral_netcdf
USING binaryFile
OPTIONS (path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")
SELECT rst_fromcontent(content) FROM coral_netcdf LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_fromcontent(content) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_fromfile
- rst_fromfile(path, <size_in_MB>)
Returns a raster tile from a file path.
- Parameters:
path (Column (StringType)) – A column containing the path to a raster file.
size_in_MB (Column (IntegerType)) – Optional parameter to specify the size of the raster tile in MB. Default is not to split the input.
- Return type:
Column: RasterTileType
Note
- Notes
The file path must be a string.
The file path must be a valid path to a raster file.
The file path must be a path to a file that GDAL can read.
If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size.
If the size_in_MB parameter is not specified or if the size_in_Mb < 0, the raster will only be split if it exceeds Integer.MAX_VALUE. The split will be at a threshold of 64MB in this case.
- example:
df = spark.read.format("binaryFile")\
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")\
.drop("content")
df.select(mos.rst_fromfile("path")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_fromfile(path) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
val df = spark.read
.format("binaryFile")
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")
.drop("content")
df.select(rst_fromfile(col("path"))).limit(1).show(false)
+----------------------------------------------------------------------------------------------------------------+
| rst_fromfile(path) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
CREATE TABLE IF NOT EXISTS TABLE coral_netcdf
USING binaryFile
OPTIONS (path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")
SELECT rst_fromfile(path) FROM coral_netcdf LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_fromfile(path) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_georeference
- rst_georeference(raster_tile)
Returns GeoTransform of the raster tile as a GT array of doubles. The output takes the form of a MapType with the following keys:
GT(0)
x-coordinate of the upper-left corner of the upper-left pixel.GT(1)
w-e pixel resolution / pixel width.GT(2)
row rotation (typically zero).GT(3)
y-coordinate of the upper-left corner of the upper-left pixel.GT(4)
column rotation (typically zero).GT(5)
n-s pixel resolution / pixel height (negative value for a north-up image).
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: MapType(StringType, DoubleType)
- Example:
df.select(mos.rst_georeference("tile")).limit(1).display()
+--------------------------------------------------------------------------------------------+
| rst_georeference(tile) |
+--------------------------------------------------------------------------------------------+
| {"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, |
| "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} |
+--------------------------------------------------------------------------------------------+
df.select(rst_georeference(col("tile"))).limit(1).show
+--------------------------------------------------------------------------------------------+
| rst_georeference(tile) |
+--------------------------------------------------------------------------------------------+
| {"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, |
| "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} |
+--------------------------------------------------------------------------------------------+
SELECT rst_georeference(tile) FROM table LIMIT 1
+--------------------------------------------------------------------------------------------+
| rst_georeference(tile) |
+--------------------------------------------------------------------------------------------+
| {"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, |
| "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} |
+--------------------------------------------------------------------------------------------+
rst_getnodata
- rst_getnodata(tile)
Returns the nodata value of the raster tile bands.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: ArrayType(DoubleType)
- Example:
df.select(mos.rst_getnodata("tile")).limit(1).display()
+---------------------+
| rst_getnodata(tile) |
+---------------------+
| [0.0, -9999.0, ...] |
+---------------------+
df.select(rst_getnodata(col("tile"))).limit(1).show
+---------------------+
| rst_getnodata(tile) |
+---------------------+
| [0.0, -9999.0, ...] |
+---------------------+
SELECT rst_getnodata(tile) FROM table LIMIT 1
+---------------------+
| rst_getnodata(tile) |
+---------------------+
| [0.0, -9999.0, ...] |
+---------------------+
rst_getsubdataset
- rst_getsubdataset(tile, name)
Returns the subdataset of the raster tile with a given name.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
name (Column (StringType)) – A column containing the name of the subdataset to return.
- Return type:
Column: RasterTileType
Note
- Notes
name
should be the last identifier in the standard GDAL subdataset path:DRIVER:PATH:NAME
.name
must be a valid subdataset name for the raster, i.e. it must exist within the raster.
- example:
df.select(mos.rst_getsubdataset("tile", "sst")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_getsubdataset(tile, sst) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df.select(rst_getsubdataset(col("tile"), lit("sst"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_getsubdataset(tile, sst) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_getsubdataset(tile, "sst") FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_getsubdataset(tile, sst) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_height
- rst_height(tile)
Returns the height of the raster tile in pixels.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: IntegerType
- Example:
df.select(mos.rst_height('tile')).display()
+--------------------+
| rst_height(tile) |
+--------------------+
| 3600 |
| 3600 |
+--------------------+
df.select(rst_height(col("tile"))).show
+--------------------+
| rst_height(tile) |
+--------------------+
|3600 |
|3600 |
+--------------------+
SELECT rst_height(tile) FROM table
+--------------------+
| rst_height(tile) |
+--------------------+
|3600 |
|3600 |
+--------------------+
rst_initnodata
- rst_initnodata(tile)
Initializes the nodata value of the raster tile bands.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: RasterTileType
Note
- Notes
The nodata value will be set to a default sentinel values according to the pixel data type of the raster bands.
The output raster will have the same extent as the input raster.
Data Type
Scala representation
Value
ByteType
0
UnsignedShortType
UShort.MaxValue
65535
ShortType
Short.MinValue
-32768
UnsignedIntegerType
Int.MaxValue
4.294967294E9
IntegerType
Int.MinValue
-2147483648
FloatType
Float.MinValue
-3.4028234663852886E38
DoubleType
Double.MinValue
-1.7976931348623157E308
- example:
df.select(mos.rst_initnodata("tile")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_initnodata(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df.select(rst_initnodata(col("tile"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_initnodata(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_initnodata(tile) FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_initnodata(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_isempty
- rst_isempty(tile)
Returns true if the raster tile is empty.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: BooleanType
- Example:
df.select(mos.rst_isempty('tile')).display()
+--------------------+
| rst_height(tile) |
+--------------------+
|false |
|false |
+--------------------+
df.select(rst_isempty(col("tile"))).show
+--------------------+
| rst_height(tile) |
+--------------------+
|false |
|false |
+--------------------+
SELECT rst_isempty(tile) FROM table
+--------------------+
| rst_height(tile) |
+--------------------+
|false |
|false |
+--------------------+
rst_maketiles
- rst_maketiles(input, driver, size, withCheckpoint)
Tiles the raster into tiles of the given size, optionally writing them to disk in the process.
- Parameters:
input (Column) – path (StringType) or content (BinaryType)
driver (Column(StringType)) – The driver to use for reading the raster.
size_in_mb (Column(IntegerType)) – The size of the tiles in MB.
with_checkpoint (Column(BooleanType)) – whether to use configured checkpoint location.
- Return type:
Column: RasterTileType
Note
Notes:
input
If the raster is stored on disk,
input
should be the path to the raster, similar to rst_fromfile.If the raster is stored in memory,
input
should be the byte array representation of the raster, similar to rst_fromcontent.
driver
If not specified,
driver
is inferred from the file extensionIf the input is a byte array, the driver must be explicitly specified.
size
If
size
is set to -1, the file is loaded and returned as a single tileIf set to 0, the file is loaded and subdivided into tiles of size 64MB
If set to a positive value, the file is loaded and subdivided into tiles of the specified size
If the file is too big to fit in memory, it is subdivided into tiles of size 64MB.
with_checkpoint
If
with_checkpoint
set to true, the tiles are written to the checkpoint directoryIf set to false, the tiles are returned as in-memory byte arrays.
Once enabled, checkpointing will remain enabled for tiles originating from this function,
meaning follow-on calls will also use checkpointing. To switch away from checkpointing down the line,
you could call rst_fromfile using the checkpointed locations as the path
input.
- example:
spark.read.format("binaryFile").load(dbfs_dir)\
.select(rst_maketiles("path")).limit(1).display()
+------------------------------------------------------------------------+
| tile |
+------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAMAAA (truncated)","metadata":{ |
| "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} |
+------------------------------------------------------------------------+
spark.read.format("binaryFile").load(dbfs_dir)
.select(rst_maketiles(col("path"))).limit(1).show
+------------------------------------------------------------------------+
| tile |
+------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAMAAA (truncated)","metadata":{ |
| "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} |
+------------------------------------------------------------------------+
SELECT rst_maketiles(path) FROM table LIMIT 1
+------------------------------------------------------------------------+
| tile |
+------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAMAAA (truncated)","metadata":{ |
| "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} |
+------------------------------------------------------------------------+
rst_mapalgebra
- rst_mapalgebra(tile, json_spec)
Performs map algebra on the raster tile.
Employs the
gdal_calc
command line raster calculator with standard numpy syntax. Use any basic arithmetic supported by numpy arrays (such as +, -, *, and /) along with logical operators (such as >, <, =).For this distributed implementation, all rasters must have the same dimensions and no projection checking is performed.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
json_spec (Column (StringType)) – A column containing the map algebra operation specification.
- Return type:
Column: RasterTileType
Note
- The
json_spec
parameter
Input rasters to the algebra function are referencable as variables with names
A
throughZ
.Bands from the input
tile
are referencable using ordinal 0..n values.Examples of valid
json_spec
(1) '{"calc": "A+B/C"}' (2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}' (3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'
In these examples:
demonstrates default indexing (i.e. the first three bands in
tile
are assigned A, B and C respectively)demonstrates reusing an index (B and C represent the same band); and
shows band indexing.
- example:
df.select(mos.rst_mapalgebra("tile", "{calc: 'A+B', A_index: 0, B_index: 1}").alias("tile").limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| tile |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df.select(mos.rst_mapalgebra("tile", "{calc: 'A+B', A_index: 0, B_index: 1}").as("tile")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| tile |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_mapalgebra(tile, "{calc: 'A+B', A_index: 0, B_index: 1}") as tile FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| tile |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_max
- rst_max(tile)
Returns an array containing maximum values for each band.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: ArrayType(DoubleType)
- Example:
df.selectExpr(mos.rst_max("tile")).limit(1).display()
+---------------+
| rst_max(tile) |
+---------------+
| [42.0] |
+---------------+
df.select(rst_max(col("tile"))).limit(1).show
+---------------+
| rst_max(tile) |
+---------------+
| [42.0] |
+---------------+
SELECT rst_max(tile) FROM table LIMIT 1
+---------------+
| rst_max(tile) |
+---------------+
| [42.0] |
+---------------+
rst_median
- rst_median(tile)
Returns an array containing median values for each band.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: ArrayType(DoubleType)
- Example:
df.selectExpr(mos.rst_median("tile")).limit(1).display()
+---------------+
| rst_median(tile) |
+---------------+
| [42.0] |
+---------------+
df.select(rst_median(col("tile"))).limit(1).show
+---------------+
| rst_median(tile) |
+---------------+
| [42.0] |
+---------------+
SELECT rst_median(tile) FROM table LIMIT 1
+---------------+
| rst_median(tile) |
+---------------+
| [42.0] |
+---------------+
rst_memsize
- rst_memsize(tile)
Returns size of the raster tile in bytes.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: LongType
- Example:
df.select(mos.rst_memsize('tile')).display()
+--------------------+
| rst_height(tile) |
+--------------------+
|730260 |
|730260 |
+--------------------+
df.select(rst_memsize(col("tile"))).show
+--------------------+
| rst_height(tile) |
+--------------------+
|730260 |
|730260 |
+--------------------+
SELECT rst_memsize(tile) FROM table
+--------------------+
| rst_height(tile) |
+--------------------+
|730260 |
|730260 |
+--------------------+
rst_merge
- rst_merge(tiles)
Combines a collection of raster tiles into a single raster.
- Parameters:
tiles (Column (ArrayType(RasterTileType))) – A column containing an array of raster tiles.
- Return type:
Column: RasterTileType
Note
Notes
- Input tiles supplied in
tiles
: are not required to have the same extent.
must have the same coordinate reference system.
must have the same pixel data type.
will be combined using the
gdalwarp
command.require a
noData
value to have been initialised (if this is not the case, the non valid pixels may introduce artifacts in the output raster).will be stacked in the order they are provided.
- The resulting output raster will have:
an extent that covers all of the input tiles;
the same number of bands as the input tiles;
the same pixel type as the input tiles;
the same pixel size as the highest resolution input tiles; and
the same coordinate reference system as the input tiles.
See also rst_merge_agg function.
- example:
df.select(F.array("tile1", "tile2", "tile3").alias("tiles"))\
.select(mos.rst_merge("tiles")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_merge(tiles) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df.select(array("tile1", "tile2", "tile3").as("tiles"))
.select(rst_merge(col("tiles"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_merge(tiles) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_merge(array(tile1,tile2,tile3)) FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_merge(array(tile1,tile2,tile3)) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_metadata
- rst_metadata(tile)
Extract the metadata describing the raster tile. Metadata is return as a map of key value pairs.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: MapType(StringType, StringType)
- Example:
df.select(mos.rst_metadata('tile')).display()
+--------------------------------------------------------------------------------------------------------------------+
| rst_metadata(tile) |
+--------------------------------------------------------------------------------------------------------------------+
| {"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", |
| "NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", |
| "NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 |
| 180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface |
| Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal |
| Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", |
| .... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global |
| 5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 |
| (v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", |
| "NC_GLOBAL#cdm_data_type": "Grid"} |
+--------------------------------------------------------------------------------------------------------------------+
df.select(rst_metadata(col("tile"))).show
+--------------------------------------------------------------------------------------------------------------------+
| rst_metadata(tile) |
+--------------------------------------------------------------------------------------------------------------------+
| {"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", |
| "NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", |
| "NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 |
| 180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface |
| Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal |
| Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", |
| .... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global |
| 5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 |
| (v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", |
| "NC_GLOBAL#cdm_data_type": "Grid"} |
+--------------------------------------------------------------------------------------------------------------------+
SELECT rst_metadata(tile) FROM table LIMIT 1
+--------------------------------------------------------------------------------------------------------------------+
| rst_metadata(tile) |
+--------------------------------------------------------------------------------------------------------------------+
| {"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", |
| "NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", |
| "NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 |
| 180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface |
| Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal |
| Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", |
| .... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global |
| 5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 |
| (v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", |
| "NC_GLOBAL#cdm_data_type": "Grid"} |
+--------------------------------------------------------------------------------------------------------------------+
rst_min
- rst_min(tile)
Returns an array containing minimum values for each band.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: ArrayType(DoubleType)
- Example:
df.selectExpr(mos.rst_min("tile")).limit(1).display()
+---------------+
| rst_min(tile) |
+---------------+
| [42.0] |
+---------------+
df.select(rst_min(col("tile"))).limit(1).show
+---------------+
| rst_min(tile) |
+---------------+
| [42.0] |
+---------------+
SELECT rst_min(tile) FROM table LIMIT 1
+---------------+
| rst_min(tile) |
+---------------+
| [42.0] |
+---------------+
rst_ndvi
- rst_ndvi(tile, red_band_num, nir_band_num)
Calculates the Normalized Difference Vegetation Index (NDVI) for a raster.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
red_band_num (Column (IntegerType)) – A column containing the band number of the red band.
nir_band_num (Column (IntegerType)) – A column containing the band number of the near infrared band.
- Return type:
Column: RasterTileType
Note
NDVI is calculated using the formula: (NIR - RED) / (NIR + RED).
- The output raster tiles will have:
the same extent as the input raster.
a single band.
a pixel data type of float64.
the same coordinate reference system as the input raster.
- example:
df.select(mos.rst_ndvi("tile", 1, 2)).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_ndvi(tile, 1, 2) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
df.select(rst_ndvi(col("tile"), lit(1), lit(2))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_ndvi(tile, 1, 2) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
SELECT rst_ndvi(tile, 1, 2) FROM table LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_ndvi(tile, 1, 2) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
rst_numbands
- rst_numbands(tile)
Returns number of bands in the raster tile.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: IntegerType
- Example:
df.select(mos.rst_numbands('tile')).display()
+---------------------+
| rst_numbands(tile) |
+---------------------+
| 1 |
| 1 |
+---------------------+
df.select(rst_metadata(col("tile"))).show
+---------------------+
| rst_numbands(tile) |
+---------------------+
| 1 |
| 1 |
+---------------------+
SELECT rst_metadata(tile) FROM table
+---------------------+
| rst_numbands(tile) |
+---------------------+
| 1 |
| 1 |
+---------------------+
rst_pixelcount
- rst_pixelcount(tile, count_nodata, count_all)
Returns an array containing pixel count values for each band; default excludes mask and nodata pixels.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
count_nodata (Column (BooleanType)) – A column to specify whether to count nodata pixels.
count_all (Column (BooleanType)) – A column to specify whether to count all pixels.
- Return type:
Column: ArrayType(LongType)
Note
Notes:
If pixel value is noData or mask value is 0.0, the pixel is not counted by default.
count_nodata
This is an optional param.
if specified as true, include the noData (not mask) pixels in the count (default is false).
count_all
This is an optional param; as a positional arg, must also pass
count_nodata
(value ofcount_nodata
is ignored).if specified as true, simply return bandX * bandY in the count (default is false).
- example:
df.select(mos.rst_pixelcount('tile')).display()
+----------------------+
| rst_pixelcount(tile) |
+----------------------+
| [120560172] |
+----------------------+
df.select(rst_pixelcount(col("tile"))).show
+----------------------+
| rst_pixelcount(tile) |
+----------------------+
| [120560172] |
+----------------------+
SELECT rst_pixelcount(tile) FROM table
+----------------------+
| rst_pixelcount(tile) |
+----------------------+
| [120560172] |
+----------------------+
rst_pixelheight
- rst_pixelheight(tile)
Returns the height of the pixel in the raster tile derived via GeoTransform.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_pixelheight('tile')).display()
+-----------------------+
| rst_pixelheight(tile) |
+-----------------------+
| 1 |
| 1 |
+-----------------------+
df.select(rst_pixelheight(col("tile"))).show
+-----------------------+
| rst_pixelheight(tile) |
+-----------------------+
| 1 |
| 1 |
+-----------------------+
SELECT rst_pixelheight(tile) FROM table
+-----------------------+
| rst_pixelheight(tile) |
+-----------------------+
| 1 |
| 1 |
+-----------------------+
rst_pixelwidth
- rst_pixelwidth(tile)
Returns the width of the pixel in the raster tile derived via GeoTransform.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_pixelwidth('tile')).display()
+---------------------+
| rst_pixelwidth(tile)|
+---------------------+
| 1 |
| 1 |
+---------------------+
df.select(rst_pixelwidth(col("tile"))).show
+---------------------+
| rst_pixelwidth(tile)|
+---------------------+
| 1 |
| 1 |
+---------------------+
SELECT rst_pixelwidth(tile) FROM table
+---------------------+
| rst_pixelwidth(tile)|
+---------------------+
| 1 |
| 1 |
+---------------------+
rst_rastertogridavg
- rst_rastertogridavg(tile, resolution)
Compute the gridwise mean of the pixel values in
tile
.The result is a 2D array of cells, where each cell is a struct of (
cellID
,value
).- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
resolution (Column (IntegerType)) – A resolution of the grid index system.
- Return type:
Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType)))
Note
- Notes
To obtain cellID->value pairs, use the Spark SQL explode() function twice.
CellID can be LongType or StringType depending on the configuration of MosaicContext.
The value/measure for each cell is the average of the pixel values in the cell.
- example:
df.select(mos.rst_rastertogridavg('tile', F.lit(3))).display()
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridavg(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertogridavg(col("tile"), lit(3))).show
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridavg(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertogridavg(tile, 3) FROM table
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridavg(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
rst_rastertogridcount
- rst_rastertogridcount(tile, resolution)
Compute the gridwise count of the pixels in
tile
.The result is a 2D array of cells, where each cell is a struct of (
cellID
,value
).- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
resolution (Column (IntegerType)) – A resolution of the grid index system.
- Return type:
Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType)))
Note
- Notes
To obtain cellID->value pairs, use the Spark SQL explode() function twice.
CellID can be LongType or StringType depending on the configuration of MosaicContext.
The value/measure for each cell is the count of the pixel values in the cell.
- example:
df.select(mos.rst_rastertogridcount('tile', F.lit(3))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridcount(tile, 3) |
+------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 3}, |
| {"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertogridcount(col("tile"), lit(3))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridcount(tile, 3) |
+------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 3}, |
| {"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertogridcount(tile, 3) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridcount(tile, 3) |
+------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 3}, |
| {"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+------------------------------------------------------------------------------------------------------------------+
rst_rastertogridmax
- rst_rastertogridmax(tile, resolution)
Compute the gridwise maximum of the pixels in
tile
.The result is a 2D array of cells, where each cell is a struct of (
cellID
,value
).- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
resolution (Column (IntegerType)) – A resolution of the grid index system.
- Return type:
Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType)))
Note
- Notes
To obtain cellID->value pairs, use the Spark SQL explode() function twice.
CellID can be LongType or StringType depending on the configuration of MosaicContext.
The value/measure for each cell is the maximum of the pixel values in the cell.
- example:
df.select(mos.rst_rastertogridmax('tile', F.lit(3))).display()
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmax(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertogridmax(col("tile"), lit(3))).show
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmax(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertogridmax(tile, 3) FROM table
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmax(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
rst_rastertogridmedian
- rst_rastertogridmedian(tile, resolution)
Compute the gridwise median value of the pixels in
tile
.The result is a 2D array of cells, where each cell is a struct of (
cellID
,value
).- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
resolution (Column (IntegerType)) – A resolution of the grid index system.
- Return type:
Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType)))
Note
- Notes
To obtain cellID->value pairs, use the Spark SQL explode() function twice.
CellID can be LongType or StringType depending on the configuration of MosaicContext.
The value/measure for each cell is the median of the pixel values in the cell.
- example:
df.select(mos.rst_rastertogridmedian('tile', F.lit(3))).display()
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmedian(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertogridmedian(col("tile"), lit(3))).show
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmedian(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertogridmedian(tile, 3) FROM table
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmedian(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
rst_rastertogridmin
- rst_rastertogridmin(tile, resolution)
Compute the gridwise minimum of the pixel values in
tile
.The result is a 2D array of cells, where each cell is a struct of (
cellID
,value
).- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
resolution (Column (IntegerType)) – A resolution of the grid index system.
- Return type:
Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType)))
Note
- Notes
To obtain cellID->value pairs, use the Spark SQL explode() function twice.
CellID can be LongType or StringType depending on the configuration of MosaicContext.
The value/measure for each cell is the minimum of the pixel values in the cell.
- example:
df.select(mos.rst_rastertogridmin('tile', F.lit(3))).display()
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmin(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertogridmin(col("tile"), lit(3))).show
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmin(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertogridmin(tile, 3) FROM table
+--------------------------------------------------------------------------------------------------------------------+
| rst_rastertogridmin(tile, 3) |
+--------------------------------------------------------------------------------------------------------------------+
| [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, |
| {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, |
| {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, |
| {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, |
| {"cellID": "593472602366803967", "measure": 0.3963963963963964}, |
| {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, |
| {"cellID": "592336738135834623", "measure": 1}, ....]] |
+--------------------------------------------------------------------------------------------------------------------+
rst_rastertoworldcoord
- rst_rastertoworldcoord(tile, x, y)
Computes the world coordinates of the raster tile at the given x and y pixel coordinates.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
x (Column (IntegerType)) – x coordinate of the pixel.
y (Column (IntegerType)) – y coordinate of the pixel.
- Return type:
Column: StringType
Note
- Notes
The result is a WKT point geometry.
The coordinates are computed using the GeoTransform of the raster to respect the projection.
- example:
df.select(mos.rst_rastertoworldcoord('tile', F.lit(3), F.lit(3))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoord(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
|POINT (-179.85000609927647 89.84999847624096) |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertoworldcoord(col("tile"), lit(3), lit(3))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoord(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
|POINT (-179.85000609927647 89.84999847624096) |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertoworldcoord(tile, 3, 3) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoord(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
|POINT (-179.85000609927647 89.84999847624096) |
+------------------------------------------------------------------------------------------------------------------+
rst_rastertoworldcoordx
- rst_rastertoworldcoordx(tile, x, y)
Computes the world coordinates of the raster tile at the given x and y pixel coordinates.
The result is the X coordinate of the point after applying the GeoTransform of the raster.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
x (Column (IntegerType)) – x coordinate of the pixel.
y (Column (IntegerType)) – y coordinate of the pixel.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_rastertoworldcoordx('tile', F.lit(3), F.lit(3))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoordx(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
| -179.85000609927647 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertoworldcoordx(col("tile"), lit(3), lit(3))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoordx(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
| -179.85000609927647 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertoworldcoordx(tile, 3, 3) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoordx(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
| -179.85000609927647 |
+------------------------------------------------------------------------------------------------------------------+
rst_rastertoworldcoordy
- rst_rastertoworldcoordy(tile, x, y)
Computes the world coordinates of the raster tile at the given x and y pixel coordinates.
The result is the Y coordinate of the point after applying the GeoTransform of the raster.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
x (Column (IntegerType)) – x coordinate of the pixel.
y (Column (IntegerType)) – y coordinate of the pixel.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_rastertoworldcoordy('tile', F.lit(3), F.lit(3))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoordy(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
| 89.84999847624096 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_rastertoworldcoordy(col("tile"), lit(3), lit(3))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoordy(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
| 89.84999847624096 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_rastertoworldcoordy(tile, 3, 3) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_rastertoworldcoordy(tile, 3, 3) |
+------------------------------------------------------------------------------------------------------------------+
| 89.84999847624096 |
+------------------------------------------------------------------------------------------------------------------+
rst_retile
- rst_retile(tile, width, height)
Retiles the raster tile to the given size. The result is a collection of new raster tiles.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
width (Column (IntegerType)) – The width of the tiles.
height (Column (IntegerType)) – The height of the tiles.
- Return type:
Column: (RasterTileType)
- Example:
df.select(mos.rst_retile('tile', F.lit(300), F.lit(300))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_retile(tile, 300, 300) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_retile(col("tile"), lit(300), lit(300))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_retile(tile, 300, 300) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_retile(tile, 300, 300) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_retile(tile, 300, 300) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+------------------------------------------------------------------------------------------------------------------+
rst_rotation
- rst_rotation(tile)
Computes the angle of rotation between the X axis of the raster tile and geographic North in degrees using the GeoTransform of the raster.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_rotation('tile').display()
+------------------------------------------------------------------------------------------------------------------+
| rst_rotation(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
| 21.2 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_rotation(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_rotation(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
| 21.2 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_rotation(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_rotation(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
| 21.2 |
+------------------------------------------------------------------------------------------------------------------+
rst_scalex
- rst_scalex(tile)
Computes the scale of the raster tile in the X direction.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_scalex('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_scalex(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_scalex(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_scalex(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_scalex(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_scalex(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
rst_scaley
- rst_scaley(tile)
Computes the scale of the raster tile in the Y direction.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_scaley('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_scaley(path) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_scaley(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_scaley(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_scaley(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_scaley(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
rst_separatebands
- rst_separatebands(tile)
Returns a set of new single-band rasters, one for each band in the input raster. The result set will contain one row per input band for each
tile
provided.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: (RasterTileType)
Note
️⚠️ Before performing this operation, you may want to add an identifier column to the dataframe to trace each band back to its original parent raster.
- example:
df.select(mos.rst_separatebands('tile')).display()
+--------------------------------------------------------------------------------------------------------------------------------+
| tile |
+--------------------------------------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)", |
| "metadata":{"path":"....tif","last_error":"","all_parents":"no_path","driver":"GTiff","bandIndex":"1","parentPath":"no_path", |
| "last_command":"gdal_translate -of GTiff -b 1 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+--------------------------------------------------------------------------------------------------------------------------------+
df.select(rst_separatebands(col("tile"))).show
+--------------------------------------------------------------------------------------------------------------------------------+
| tile |
+--------------------------------------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)", |
| "metadata":{"path":"....tif","last_error":"","all_parents":"no_path","driver":"GTiff","bandIndex":"1","parentPath":"no_path", |
| "last_command":"gdal_translate -of GTiff -b 1 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+--------------------------------------------------------------------------------------------------------------------------------+
SELECT rst_separatebands(tile) FROM table
+--------------------------------------------------------------------------------------------------------------------------------+
| tile |
+--------------------------------------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)", |
| "metadata":{"path":"....tif","last_error":"","all_parents":"no_path","driver":"GTiff","bandIndex":"1","parentPath":"no_path", |
| "last_command":"gdal_translate -of GTiff -b 1 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+--------------------------------------------------------------------------------------------------------------------------------+
rst_setnodata
- rst_setnodata(tile, nodata)
Returns a new raster tile with the nodata value set to
nodata
.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
nodata (Column (DoubleType) / ArrayType(DoubleType)) – The nodata value to set.
- Return type:
Column: (RasterTileType)
Note
- Notes
If a single
nodata
value is passed, the same nodata value is set for all bands oftile
.If an array of values is passed, the respective
nodata
value is set for each band oftile
.
- example:
df.select(mos.rst_setnodata('tile', F.lit(0))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_setnodata(tile, 0) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_setnodata(col("tile"), lit(0))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_setnodata(tile, 0) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_setnodata(tile, 0) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_setnodata(tile, 0) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
rst_setsrid
- rst_setsrid(tile, srid)
Set the SRID of the raster tile as an EPSG code.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
srid (Column (IntegerType)) – The SRID to set
- Return type:
Column: (RasterTileType)
- Example:
df.select(mos.rst_setsrid('tile', F.lit(9122))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_setsrid(tile, 9122) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, tile: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_setsrid(col("tile"), lit(9122))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_setsrid(tile, 9122) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, tile: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_setsrid(tile, 9122) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_setsrid(tile, 9122) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, tile: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
rst_skewx
- rst_skewx(tile)
Computes the skew of the raster tile in the X direction.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_skewx('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_skewx(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_skewx(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_skewx(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_skewx(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_skewx(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
rst_skewy
- rst_skewy(tile)
Computes the skew of the raster tile in the Y direction.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_skewy('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_skewy(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_skewy(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_skewy(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_skewy(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_skewy(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 1.2 |
+------------------------------------------------------------------------------------------------------------------+
rst_srid
- rst_srid(tile)
Returns the SRID of the raster tile as an EPSG code.
Note
For complex CRS definition the EPSG code may default to 0.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_srid('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_srid(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 9122 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_srid(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_srid(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 9122 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_srid(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_srid(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 9122 |
+------------------------------------------------------------------------------------------------------------------+
rst_subdatasets
- rst_subdatasets(tile)
Returns the subdatasets of the raster tile as a set of paths in the standard GDAL format.
The result is a map of the subdataset path to the subdatasets and the description of the subdatasets.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: MapType(StringType, StringType)
- Example:
df.select(mos.rst_subdatasets('tile')).display()
+--------------------------------------------------------------------------------------------------------------------+
| rst_subdatasets(tile) |
+--------------------------------------------------------------------------------------------------------------------+
| {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010 |
| 6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial |
| /mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8 |
| -bit unsigned integer)"} |
+--------------------------------------------------------------------------------------------------------------------+
df.select(rst_subdatasets(col("tile"))).show
+--------------------------------------------------------------------------------------------------------------------+
| rst_subdatasets(tile) |
+--------------------------------------------------------------------------------------------------------------------+
| {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010 |
| 6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial |
| /mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8 |
| -bit unsigned integer)"} |
+--------------------------------------------------------------------------------------------------------------------+
SELECT rst_subdatasets(tile) FROM table
+--------------------------------------------------------------------------------------------------------------------+
| rst_subdatasets(tile) |
+--------------------------------------------------------------------------------------------------------------------+
| {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010 |
| 6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial |
| /mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8 |
| -bit unsigned integer)"} |
+--------------------------------------------------------------------------------------------------------------------+
rst_subdivide
- rst_subdivide(tile, sizeInMB)
Subdivides the raster tile to the given tile size in MB. The result is a collection of new raster tiles.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
size_in_MB (Column (IntegerType)) – The size of the tiles in MB.
Note
- Notes
Each
tile
will be recursively split along two orthogonal axes until the expected size of the last child tile is <size_in_MB
.The aspect ratio of the tiles is preserved.
The result set is automatically exploded.
The size of the resulting tiles is approximate. Due to compression and other effects we cannot guarantee the size of the tiles in MB.
- example:
df.select(mos.rst_subdivide('tile', F.lit(10))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_subdivide(tile, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_subdivide(col("tile"), lit(10))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_subdivide(tile, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_subdivide(tile, 10) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_subdivide(tile, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
rst_summary
- rst_summary(tile)
Returns a summary description of the raster tile including metadata and statistics in JSON format.
Values returned here are produced by the
gdalinfo
procedure.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: MapType(StringType, StringType)
- Example:
df.select(mos.rst_summary('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_summary(tile) |
+------------------------------------------------------------------------------------------------------------------+
| { "description":"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1|
|_20220106-1.nc", "driverShortName":"netCDF", "driverLongName":"Network Common Data Format", "files":[ |
|"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc" |
|], "size":[ 512, 512 ], "metadata":{ "":{ "NC_GLOBAL#acknowledgement":"NOAA Coral Reef |
|Watch Program", "NC_GLOBAL#cdm_data_type":"Gr... |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_summary(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_summary(tile) |
+------------------------------------------------------------------------------------------------------------------+
| { "description":"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1|
|_20220106-1.nc", "driverShortName":"netCDF", "driverLongName":"Network Common Data Format", "files":[ |
|"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc" |
|], "size":[ 512, 512 ], "metadata":{ "":{ "NC_GLOBAL#acknowledgement":"NOAA Coral Reef |
|Watch Program", "NC_GLOBAL#cdm_data_type":"Gr... |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_summary(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_summary(tile) |
+------------------------------------------------------------------------------------------------------------------+
| { "description":"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1|
|_20220106-1.nc", "driverShortName":"netCDF", "driverLongName":"Network Common Data Format", "files":[ |
|"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc" |
|], "size":[ 512, 512 ], "metadata":{ "":{ "NC_GLOBAL#acknowledgement":"NOAA Coral Reef |
|Watch Program", "NC_GLOBAL#cdm_data_type":"Gr... |
+------------------------------------------------------------------------------------------------------------------+
rst_tessellate
- rst_tessellate(tile, resolution)
Divides the raster tile into tessellating chips for the given resolution of the supported grid (H3, BNG, Custom). The result is a collection of new raster tiles.
Each tile in the tile set corresponds to an index cell intersecting the bounding box of
tile
.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
resolution (Column (IntegerType)) – The resolution of the supported grid.
Note
- Notes
The result set is automatically exploded into a row-per-index-cell.
If rst_merge is called on output tile set, the original raster will be reconstructed.
Each output tile chip will have the same number of bands as its parent
tile
.
- example:
df.select(mos.rst_tessellate('tile', F.lit(10))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_tessellate(tile, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_tessellate(col("tile"), lit(10))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_tessellate(tile, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_tessellate(tile, 10) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_tessellate(tile, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
rst_tooverlappingtiles
- rst_tooverlappingtiles(tile, width, height, overlap)
Splits each
tile
into a collection of new raster tiles of the given width and height, with an overlap ofoverlap
percent.The result set is automatically exploded into a row-per-subtile.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
width (Column (IntegerType)) – The width of the tiles in pixels.
height (Column (IntegerType)) – The height of the tiles in pixels.
overlap (Column (IntegerType)) – The overlap of the tiles in percentage.
Note
- Notes
If rst_merge is called on the tile set the original raster will be reconstructed.
Each output tile chip will have the same number of bands as its parent
tile
.
- example:
df.select(mos.rst_tooverlappingtiles('tile', F.lit(10), F.lit(10), F.lit(10))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_tooverlappingtiles(tile, 10, 10, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_tooverlappingtiles(col("tile"), lit(10), lit(10), lit(10))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_tooverlappingtiles(tile, 10, 10, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_tooverlappingtiles(tile, 10, 10, 10) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_tooverlappingtiles(tile, 10, 10, 10) |
+------------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
| {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } |
+------------------------------------------------------------------------------------------------------------------+
rst_transform
- rst_transform(tile, srid)
Transforms the raster to the given SRID.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
srid (Column (IntegerType)) – EPSG authority code for the file’s projection.
- Return type:
Column: (RasterTileType)
- Example:
df.select(mos.rst_transform('tile', lit(4326))).display()
+----------------------------------------------------------------------------------------------------+
| rst_transform(tile,4326) |
+----------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", |
| "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", |
| "last_command":"gdalwarp -t_srs EPSG:4326 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+----------------------------------------------------------------------------------------------------+
df.select(rst_transform(col("tile"), lit(4326))).show
+----------------------------------------------------------------------------------------------------+
| rst_transform(tile,4326) |
+----------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", |
| "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", |
| "last_command":"gdalwarp -t_srs EPSG:4326 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+----------------------------------------------------------------------------------------------------+
SELECT rst_transform(tile,4326) FROM table
+----------------------------------------------------------------------------------------------------+
| rst_transform(tile,4326) |
+----------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", |
| "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", |
| "last_command":"gdalwarp -t_srs EPSG:4326 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+----------------------------------------------------------------------------------------------------+
rst_tryopen
- rst_tryopen(tile)
Tries to open the raster tile. If the raster cannot be opened the result is false and if the raster can be opened the result is true.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: BooleanType
- Example:
df.select(mos.rst_tryopen('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_tryopen(tile) |
+------------------------------------------------------------------------------------------------------------------+
| true |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_tryopen(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_tryopen(tile) |
+------------------------------------------------------------------------------------------------------------------+
| true |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_tryopen(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_tryopen(tile) |
+------------------------------------------------------------------------------------------------------------------+
| true |
+------------------------------------------------------------------------------------------------------------------+
rst_type
- rst_type(tile)
Returns the data type of the raster’s bands.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: StringType
- Example:
df.select(mos.rst_type('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_type(tile) |
+------------------------------------------------------------------------------------------------------------------+
| [Int16] |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_type(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_type(tile) |
+------------------------------------------------------------------------------------------------------------------+
| [Int16] |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_type(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_type(tile) |
+------------------------------------------------------------------------------------------------------------------+
| [Int16] |
+------------------------------------------------------------------------------------------------------------------+
rst_updatetype
- rst_updatetype(tile, newType)
Translates the raster to a new data type.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
newType (Column (StringType)) – Data type to translate the raster to.
- Return type:
Column: (RasterTileType)
- Example:
df.select(mos.rst_updatetype('tile', lit('Float32'))).display()
+----------------------------------------------------------------------------------------------------+
| rst_updatetype(tile,Float32) |
+----------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", |
| "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", |
| "last_command":"gdaltranslate -ot Float32 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+----------------------------------------------------------------------------------------------------+
df.select(rst_updatetype(col("tile"), lit("Float32"))).show
+----------------------------------------------------------------------------------------------------+
| rst_updatetype(tile,Float32) |
+----------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", |
| "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", |
| "last_command":"gdaltranslate -ot Float32 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+----------------------------------------------------------------------------------------------------+
SELECT rst_updatetype(tile, 'Float32') FROM table
+----------------------------------------------------------------------------------------------------+
| rst_updatetype(tile,Float32) |
+----------------------------------------------------------------------------------------------------+
| {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", |
| "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", |
| "last_command":"gdaltranslate -ot Float32 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} |
+----------------------------------------------------------------------------------------------------+
rst_upperleftx
- rst_upperleftx(tile)
Computes the upper left X coordinate of
tile
based its GeoTransform.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_upperleftx('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_upperleftx(tile) |
+------------------------------------------------------------------------------------------------------------------+
| -180.00000610436345 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_upperleftx(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_upperleftx(tile) |
+------------------------------------------------------------------------------------------------------------------+
| -180.00000610436345 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_upperleftx(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_upperleftx(tile) |
+------------------------------------------------------------------------------------------------------------------+
| -180.00000610436345 |
+------------------------------------------------------------------------------------------------------------------+
rst_upperlefty
- rst_upperlefty(tile)
Computes the upper left Y coordinate of
tile
based its GeoTransform.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: DoubleType
- Example:
df.select(mos.rst_upperlefty('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_upperlefty(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 89.99999847369712 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_upperlefty(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_upperlefty(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 89.99999847369712 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_upperlefty(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_upperlefty(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 89.99999847369712 |
+------------------------------------------------------------------------------------------------------------------+
rst_width
- rst_width(tile)
Computes the width of the raster tile in pixels.
- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
- Return type:
Column: IntegerType
- Example:
df.select(mos.rst_width('tile')).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_width(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 600 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_width(col("tile"))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_width(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 600 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_width(tile) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_width(tile) |
+------------------------------------------------------------------------------------------------------------------+
| 600 |
+------------------------------------------------------------------------------------------------------------------+
rst_worldtorastercoord
- rst_worldtorastercoord(tile, xworld, yworld)
Computes the (j, i) pixel coordinates of
xworld
andyworld
withintile
using the CRS oftile
.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
xworld (Column (DoubleType)) – X world coordinate.
yworld (Column (DoubleType)) – Y world coordinate.
- Return type:
Column: StructType(IntegerType, IntegerType)
- Example:
df.select(mos.rst_worldtorastercoord('tile', F.lit(-160.1), F.lit(40.0))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoord(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| {"x": 398, "y": 997} |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_worldtorastercoord(col("tile"), lit(-160.1), lit(40.0))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoord(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| {"x": 398, "y": 997} |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_worldtorastercoord(tile, -160.1, 40.0) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoord(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| {"x": 398, "y": 997} |
+------------------------------------------------------------------------------------------------------------------+
rst_worldtorastercoordx
- rst_worldtorastercoordx(tile, xworld, yworld)
Computes the j pixel coordinate of
xworld
andyworld
withintile
using the CRS oftile
.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
xworld (Column (DoubleType)) – X world coordinate.
yworld (Column (DoubleType)) – Y world coordinate.
- Return type:
Column: IntegerType
- Example:
df.select(mos.rst_worldtorastercoord('tile', F.lit(-160.1), F.lit(40.0))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoordx(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| 398 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_worldtorastercoordx(col("tile"), lit(-160.1), lit(40.0))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoordx(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| 398 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_worldtorastercoordx(tile, -160.1, 40.0) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoordx(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| 398 |
+------------------------------------------------------------------------------------------------------------------+
rst_worldtorastercoordy
- rst_worldtorastercoordy(tile, xworld, yworld)
Computes the i pixel coordinate of
xworld
andyworld
withintile
using the CRS oftile
.- Parameters:
tile (Column (RasterTileType)) – A column containing the raster tile.
xworld (Column (DoubleType)) – X world coordinate.
yworld (Column (DoubleType)) – Y world coordinate.
- Return type:
Column: IntegerType
- Example:
df.select(mos.rst_worldtorastercoordy('tile', F.lit(-160.1), F.lit(40.0))).display()
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoordy(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| 997 |
+------------------------------------------------------------------------------------------------------------------+
df.select(rst_worldtorastercoordy(col("tile"), lit(-160.1), lit(40.0))).show
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoordy(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| 997 |
+------------------------------------------------------------------------------------------------------------------+
SELECT rst_worldtorastercoordy(tile, -160.1, 40.0) FROM table
+------------------------------------------------------------------------------------------------------------------+
| rst_worldtorastercoordy(tile, -160.1, 40.0) |
+------------------------------------------------------------------------------------------------------------------+
| 997 |
+------------------------------------------------------------------------------------------------------------------+
rst_write
- rst_write(input, dir)
Writes raster tiles from the input column to a specified directory.
- Parameters:
input (Column) – A column containing the raster tile.
dir (Column(StringType)) – The directory, e.g. fuse, to write the tile’s raster as file.
- Return type:
Column: RasterTileType
Note
- Notes
Use
RST_Write
to save a ‘tile’ column to a specified directory (e.g. fuse) location using its already populated GDAL driver and tile information.Useful for formalizing the tile ‘path’ when writing a Lakehouse table. An example might be to turn on checkpointing for internal data pipeline phase operations in which multiple interim tiles are populated, but at the end of the phase use this function to set the final path to be used in the phase’s persisted table. Then, you are free to delete the internal tiles that accumulated in the configured checkpointing directory.
- example:
df.select(rst_write("tile", <write_dir>).alias("tile")).limit(1).display()
+------------------------------------------------------------------------+
| tile |
+------------------------------------------------------------------------+
| {"index_id":null,"tile":"<write_path>","metadata":{ |
| "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} |
+------------------------------------------------------------------------+
df.select(rst_write(col("tile", <write_dir>)).as("tile)).limit(1).show
+------------------------------------------------------------------------+
| tile |
+------------------------------------------------------------------------+
| {"index_id":null,"tile":"<write_path>","metadata":{ |
| "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} |
+------------------------------------------------------------------------+
SELECT rst_write(tile, <write_dir>) as tile FROM table LIMIT 1
+------------------------------------------------------------------------+
| tile |
+------------------------------------------------------------------------+
| {"index_id":null,"tile":"<write_path>","metadata":{ |
| "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} |
+------------------------------------------------------------------------+