PMTiles Function Reference

Most PMTiles functions — the pmtiles DataSource writer and its options — are heavyweight-only. The gbx_pmtiles_agg aggregate and the lightweight pmtiles_gbx DataSource writer are available in both tiers (see the per-function badge below and the PMTiles Writer page). See Choosing an Execution Tier for the lightweight vs heavyweight comparison.

GeoBrix encodes tile pyramids (raster or vector) into the PMTiles v3 single-file archive format. PMTiles replaces the "directory of tiles" pattern with one compact, hash-deduplicated, range-readable file servable directly from cloud object storage. Tile content bytes (PNG / JPEG / WebP / MVT) pass through verbatim — PMTiles is container-only.

Import path

databricks.labs.gbx.pmtiles (Python) or com.databricks.labs.gbx.pmtiles (Scala). PMTiles is a peer of RasterX / VectorX / GridX, not a dependency.

Two entry points

Pick based on pyramid size:

Entry point	When to use	Limit
`gbx_pmtiles_agg` UDAF (this page)	The full pyramid fits in a single Spark cell. Returns a `BINARY` column. Convenient for one-shot bundle generation.	~100 MiB of tile payload by default; hard ceiling at the 2 GiB Spark cell limit.
PMTiles Writer (`.write.format("pmtiles")`)	Larger pyramids; streaming partitioned commit writes one `.pmtiles` file with no in-memory consolidation.	Bound only by available disk on the driver during commit.

Both paths share the same native-Scala PMTiles v3 encoder — bytes they emit are byte-compatible.

Registration

from databricks.labs.gbx.pmtiles import functions as px
px.register(spark)

import com.databricks.labs.gbx.pmtiles.functions
functions.register(spark)

The DataSource writer (.write.format("pmtiles")) does NOT need registration — it is wired through META-INF/services as soon as the GeoBrix JAR is on the Spark classpath.

On the lightweight tier, gbx_pmtiles_agg is installed automatically by pyrx.register(spark) and pyvx.register(spark) (PMTiles is format-agnostic, so it belongs to both). To install only the aggregate — without the rest of a tier — call the standalone helper:

from databricks.labs.gbx.pmtiles import register_pmtiles_agg
register_pmtiles_agg(spark)   # installs gbx_pmtiles_agg (Serverless-safe)

Quick start

UDAF: aggregate to a single blob

from pyspark.sql import functions as f
from databricks.labs.gbx.pmtiles import functions as px

# tiles_df: (z: int, x: int, y: int, bytes: binary)
pmt = (
    tiles_df.agg(
        px.pmtiles_agg(
            f.col("bytes"), f.col("z"), f.col("x"), f.col("y"),
            '{"name":"my_tileset","attribution":"contoso"}',
        ).alias("pmt")
    )
    .collect()[0]["pmt"]
)

with open("/tmp/out.pmtiles", "wb") as fh:
    fh.write(pmt)

SELECT gbx_pmtiles_agg(bytes, z, x, y, '{"name":"my_tileset"}') AS pmt
FROM tiles_z2;

DataSource: stream to a single `.pmtiles` file

(
    tiles_df
    .write
    .format("pmtiles")
    .option("metadataJson", '{"name":"my_tileset"}')
    .mode("overwrite")
    .save("/tmp/out.pmtiles")
)

tilesDf.write
    .format("pmtiles")
    .option("metadataJson", "{\"name\":\"my_tileset\"}")
    .mode("overwrite")
    .save("/tmp/out.pmtiles")

The output path is the final file, not a directory: scratch _part_*.tdata and _part_*.entries files are written alongside it during the commit phase and deleted on success.

Save mode

Always pass .mode("overwrite"). The default ErrorIfExists is not supported — the failure is loud and points you at .mode("overwrite").

Schema contract

The DataSource writer enforces an exact write schema:

z      INT       — tile zoom level (0..31)
x      INT       — tile x within the zoom
y      INT       — tile y within the zoom
bytes  BINARY    — tile payload (PNG / JPEG / WebP / MVT)

Missing columns, extra columns, or wrong types all raise a single IllegalArgumentException that names the canonical schema. The UDAF is more relaxed: z/x/y accept either INT or LONG (PySpark's createDataFrame infers Python ints as LongType by default, which the UDAF coerces in update).

Tile-type detection

The encoder reads the first 12 bytes of the first non-empty tile payload and sets the PMTiles header's tile_type byte:

Magic bytes	tile_type	Meaning
`89 50 4E 47`	2 (PNG)	PNG raster
`FF D8`	3 (JPEG)	JPEG raster
`RIFF????WEBP`	4 (WebP)	WebP raster
anything else	1 (MVT)	Mapbox Vector Tile (protobuf)

Override auto-detection via .option("tileType", "<byte>") (e.g. "2" for PNG when emitting tiles via a custom encoder that doesn't carry standard magic bytes).

Tile compression

GeoBrix passes tile bytes through unchanged. If your tiles are already compressed (e.g. gzipped MVTs), set .option("tileCompression", "<byte>") so the PMTiles header advertises the correct compression to downstream readers:

Byte	Compression (spec § 3.3)
`1`	None (default)
`2`	gzip
`3`	brotli
`4`	zstd

The internal compression (root directory + metadata) is always none in v0.4.0; the spec's compressed-root-directory variant ships in a future release.

SQL examples

Examples below use SQL. PMTiles functions are prefixed with gbx_ (e.g. gbx_pmtiles_agg). For language-specific usage, see Language Bindings.

pmtiles_agg

LightweightHeavyweight Grouped-agg UDF

Lightweight tier (pyrx/pyvx)

Powered by the pmtiles package. Grouped aggregate — groupBy(...).agg(px.pmtiles_agg("bytes", "z", "x", "y")) folds a group's (bytes, z, x, y) map tiles into one PMTiles v3 archive (BINARY). Registered by both pyrx.register and pyvx.register — accepts raster tiles (PNG / JPEG / WebP) or vector tiles (MVT) in either tier.

Aggregate a per-tile (z, x, y, bytes) row set into a single PMTile v3 archive blob.

Signature: pmtiles_agg(bytes: Column, z: Column, x: Column, y: Column, metadataJson: Column): Column

Parameters:

bytes — Tile payload (BINARY). PNG / JPEG / WebP magic bytes are auto-detected; everything else is treated as MVT.
z, x, y — Tile coordinates (INT or BIGINT — the UDAF coerces LongType inputs).
metadataJson — Optional JSON metadata string written into the PMTile header. Pass '{}' (or omit, using the 4-argument form) for no metadata.

Returns:

Binary blob containing the full PMTile v3 archive.

SQL:

-- Build a 9-tile PMTile pyramid from an existing `tiles_z2(z, x, y, bytes)` table.
-- The result column `pmt` is a BINARY blob containing the full PMTile v3 archive.
SELECT gbx_pmtiles_agg(bytes, z, x, y, '{"name":"my_tileset"}') AS pmt
FROM tiles_z2;

The 4-argument form omits the metadata JSON (defaults to '{}'):

-- 4-arg form: metadata defaults to '{}'. Result is still a valid PMTile v3 blob.
SELECT gbx_pmtiles_agg(bytes, z, x, y) AS pmt
FROM tiles_z2;

Duplicate tile coordinates

When multiple rows share the same tile coordinates (z, x, y):

Vector (MVT) tiles — features from all matching rows are combined into one multi-feature tile. Features from different layers are kept in their respective layers; attributes are preserved per feature.
Raster tiles (PNG, JPEG, WebP) — the first non-null tile is used; subsequent tiles for the same coordinates are ignored (raster images cannot be meaningfully combined).

Tile type is detected automatically from the content of the first non-null payload (see Tile-type detection above).

Typical pipelines

Raster pyramid: gbx_rst_xyzpyramid(tile, minZoom, maxZoom) produces per-tile rows of PNG bytes — pipe straight into gbx_pmtiles_agg.
Vector pyramid: gbx_st_asmvt_pyramid(geom_wkb, attrs, minZoom, maxZoom, layer) produces per-tile MVT bytes — pipe straight into gbx_pmtiles_agg. Because gbx_st_asmvt_pyramid emits one row per feature, the aggregate merges all features that share a tile coordinate into a single multi-feature tile.

For pyramids that exceed the Spark cell ceiling, use the PMTiles Writer instead.

See the Helios notebooks for a worked end-to-end example: gbx_pmtiles_agg packages vector MVT tiles (NB01), a raster XYZ pyramid (NB02), and a hillshade pyramid (NB03) into separate PMTiles archives over a San Francisco AOI.

Serving from object storage

PMTiles is designed to be served as a single static file via HTTP Range requests. After uploading the output .pmtiles to S3 / ABFS / GCS:

CORS: enable GET, HEAD, OPTIONS for your map host; allow Range and If-Match headers.
Content-Type: serve as application/vnd.pmtiles.
Browse: drop the URL into pmtiles.io for a visual sanity check.

Embed in MapLibre (pin to a specific version and add integrity/crossorigin SRI attributes for production use):

<script src="https://unpkg.com/pmtiles@3/dist/pmtiles.js"></script>
<script>
  const protocol = new pmtiles.Protocol();
  maplibregl.addProtocol("pmtiles", protocol.tile);
  const map = new maplibregl.Map({
    container: "map",
    style: {
      version: 8,
      sources: { my: { type: "vector", url: "pmtiles://https://my-bucket/out.pmtiles" } },
      layers: [/* ... */]
    }
  });
</script>

Limits in v0.4.0

No leaf directories. If the global root directory would exceed 16,257 bytes (spec § 4), the encoder errors out and asks you to split your input. In practice this only happens with very large pyramids (tens of millions of tiles); the limit will be relaxed in a future release.
No read path. spark.read.format("pmtiles") raises a friendly "Reading PMTiles archives is not supported in GeoBrix 0.4.0" error — use one of the JS / Python pmtiles client libraries for read access.
No cross-task dedup in the DataSource. Identical tiles across partitions are stored multiple times in the final file. The UDAF path does per-blob SHA-256 dedup, so for known-redundant pyramids prefer the UDAF if your data fits.

References

Next Steps

PMTiles Writer — DataSource for streaming large pyramids to disk.
Raster Functions — Generate tile bytes with gbx_rst_xyzpyramid.
Helios notebooks — worked end-to-end example using gbx_pmtiles_agg for all three data modalities (vector MVT, raster XYZ, hillshade).
VectorX Function Reference — Generate MVT tiles with gbx_st_asmvt_pyramid.

Two entry points​

Registration​

Quick start​

UDAF: aggregate to a single blob​

DataSource: stream to a single .pmtiles file​

Schema contract​

Tile-type detection​

Tile compression​

pmtiles_agg​

Duplicate tile coordinates​

Typical pipelines​

Serving from object storage​

Limits in v0.4.0​

References​

Next Steps​