Installation guide
Supported platforms
Warning
- From versions after 0.3.x, Mosaic will require either
Databricks Runtime 11.2+ with Photon enabled
Databricks Runtime for ML 11.2+
Mosaic 0.3 series does not support DBR 13 (coming soon); also, DBR 10 is no longer supported in Mosaic.
We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the Databricks H3 expressions when using H3 grid system. As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [ADB | AWS | GCP]:
DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster after v0.3.x. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits).
If you are receiving this warning in v0.3.11+, you will want to begin to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider.
If you have cluster creation permissions in your Databricks workspace, you can create a cluster using the instructions here.
You will also need “Can Manage” permissions on this cluster in order to attach the Mosaic library to your cluster. A workspace administrator will be able to grant these permissions and more information about cluster permissions can be found in our documentation here.
Package installation
Installation from PyPI
Python users can install the library directly from PyPI
using the instructions here
or from within a Databricks notebook using the %pip
magic command, e.g.
%pip install databricks-mosaic
Installation from release artifacts
Alternatively, you can access the latest release artifacts here and manually attach the appropriate library to your cluster.
Which artifact you choose to attach will depend on the language API you intend to use.
For Python API users, choose the Python .whl file.
For Scala users, take the Scala JAR (packaged with all necessary dependencies).
For R users, download the Scala JAR and the R bindings library [see the sparkR readme](R/sparkR-mosaic/README.md).
Instructions for how to attach libraries to a Databricks cluster can be found here.
Automated SQL registration
If you would like to use Mosaic’s functions in pure SQL (in a SQL notebook, from a business intelligence tool, or via a middleware layer such as Geoserver, perhaps) then you can configure “Automatic SQL Registration” using the instructions here.
Enabling the Mosaic functions
The mechanism for enabling the Mosaic functions varies by language:
from mosaic import enable_mosaic
enable_mosaic(spark, dbutils)
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.JTS
val mosaicContext = MosaicContext.build(H3, JTS)
import mosaicContext.functions._
library(sparkrMosaic)
enableMosaic()
SQL usage
If you have not employed Automatic SQL registration, you will need to register the Mosaic SQL functions in your SparkSession from a Scala notebook cell:
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.JTS
val mosaicContext = MosaicContext.build(H3, JTS)
mosaicContext.register(spark)