PyPI Version PyPI Monthly Downloads Codecov build Mosaic sphinx docs Language grade: Python Code style: black

Mosaic is an extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.

Warning

Warning: You are in a historic version (0.3.14) of the documentation!

For the latest version please go to latest docs.

Warning

From versions after 0.3.x, Mosaic will require either
  • Databricks Runtime 11.2+ with Photon enabled

  • Databricks Runtime for ML 11.2+

Mosaic 0.3 series does not yet support DBR 13 (coming soon); also, DBR 10 is no longer supported in Mosaic.

We currently recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the Databricks H3 expressions when using H3 grid system.

Mosaic provides:
  • easy conversion between common spatial data encodings (WKT, WKB and GeoJSON);

  • constructors to easily generate new geometries from Spark native data types;

  • many of the OGC SQL standard ST_ functions implemented as Spark Expressions for transforming, aggregating and joining spatial datasets;

  • high performance through implementation of Spark code generation within the core Mosaic functions;

  • optimisations for performing point-in-polygon joins using an approach we co-developed with Ordnance Survey (blog post); and

  • the choice of a Scala, SQL and Python API.

Documentation

Indices and tables

Project Support

Please note that all projects in the databrickslabs github space are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.