Skip to main content

VectorX Function Reference

VectorX augments the product's native ST_* functions with vector-tile encoding, TIN surface modeling, and legacy-geometry migration helpers. As of v0.4.0 it covers:

  • Vector tile encodinggbx_st_asmvt aggregator + gbx_st_asmvt_pyramid generator for publishing Mapbox Vector Tile (MVT) layers, available in both the lightweight (pyvx) and heavyweight (vectorx) tiers
  • TIN surface modelinggbx_st_triangulate, gbx_st_interpolateelevationbbox, and gbx_st_interpolateelevationgeom for Delaunay triangulation and grid elevation interpolation from Z-valued points, in both tiers (with a constrained/conforming mode selector)
  • OGR-based vector readers — Shapefile, GeoJSON, GeoPackage, FileGDB (heavyweight only)
  • Legacy Mosaic conversiongbx_st_legacyaswkb for migrating geometries written by DBLabs Mosaic, in both tiers
Using these functions
  • Import paths — vector tile encoding: databricks.labs.gbx.pyvx (lightweight) · databricks.labs.gbx.vectorx Python / com.databricks.labs.gbx.vectorx Scala (heavyweight). Legacy Mosaic conversion: databricks.labs.gbx.vectorx.jts.legacy (Python) / com.databricks.labs.gbx.vectorx.jts.legacy (Scala).
  • SQL examples — examples use SQL (and Python where shown); in SQL, VectorX functions are prefixed with gbx_ (e.g. gbx_st_legacyaswkb). For SQL, Python, and Scala usage patterns, see Language Bindings.
  • Geometry input encodings — every gbx_st_* geometry input accepts WKB, EWKB, WKT, and EWKT interchangeably. WKB/WKT carry no SRID; EWKB/EWKT carry one. Pass whichever encoding your upstream produces — no separate conversion step is required.

Tier availability

FunctionLightweight (pyvx)Heavyweight (vectorx)
st_asmvtSupportedSupported
st_asmvt_pyramidSupportedSupported
st_triangulateSupported (constrained)Supported (constrained + conforming)
st_interpolateelevationbboxSupported (constrained)Supported (constrained + conforming)
st_interpolateelevationgeomSupported (constrained)Supported (constrained + conforming)
st_legacyaswkbSupportedSupported

Vector tile output

Encode features into Mapbox Vector Tile (MVT) protobufs. Pair the per-tile MVT bytes with gbx_pmtiles_agg or the PMTiles writer to publish a vector pyramid as a single .pmtiles archive targeting MapLibre, deck.gl, Mapbox GL JS, or Felt.

Both tiers expose identical st_asmvt / st_asmvt_pyramid names and identical output schemas. st_asmvt is swap-compatible across both the SQL and Python DataFrame APIs. st_asmvt_pyramid is interchangeable at the SQL level (invoked via LATERAL); its Python DataFrame Column form is heavy-only, because the lightweight pyramid is a Python UDTF, which SQL LATERAL calls but the DataFrame API does not expose. The one-line swap:

# Lightweight (pyvx) — Serverless-safe, no JAR
from databricks.labs.gbx.pyvx import functions as vx

# Heavyweight (vectorx) — classic x86 cluster, JAR required
from databricks.labs.gbx.vectorx import functions as vx

# Everything below is identical in both tiers:
vx.register(spark)

Options

st_asmvt takes a layer_name argument (plain string or Column). st_asmvt_pyramid additionally accepts:

ArgumentDefaultDescription
layer_name"layer"MVT layer name embedded in the protobuf.
extent4096MVT tile extent in pixels (MVT v2 standard).

Compute compatibility

AspectLightweight (pyvx)Heavyweight (vectorx)
InstallVolume-staged [light] wheel (install)Init script + JAR
Serverless / shared / ARMSupportedNot supported
Lakeflow declarative pipelinesSupportedNot supported
Execution modelPython UDTF / pandas UDFJVM (Scala + Spark columnar)
JVM accessNone — spark.udf.register onlyRequired

Native attribute typing

Both tiers encode MVT feature attributes with native protobuf value types:

  • Integer / Long → int64 value
  • Float / Double → double value
  • Boolean → bool value
  • String (and anything else) → string value

This means downstream clients (MapLibre GL JS, Mapbox GL JS, deck.gl) receive numbers as numbers and booleans as booleans — enabling numeric data-driven styles, filter expressions, and arithmetic without a client-side parseFloat call.


st_asmvt

LightweightGrouped-agg UDF

Aggregator that encodes a group of features into a single MVT protobuf blob for one (z, x, y) tile. Each call to groupBy(z, x, y).agg(vx.st_asmvt(...)) produces the MVT bytes for exactly one tile.

Signature: st_asmvt(geom_wkb, attrs, layer_name) → BINARY

Parameters:

  • geom_wkb (BINARY) — Feature geometry as WKB in tile-local coordinates (pixel space, 0..extent). Clip and project each feature to the tile coordinate system upstream before calling this aggregator.
  • attrs (STRUCT<...>) — Per-feature attribute struct. Integer, float, boolean, and string fields are encoded with native MVT protobuf value types.
  • layer_name (STRING or str) — MVT layer name. Pass a plain Python string or a Column.

Returns: BINARY — the MVT protobuf for one tile layer. Feed directly into gbx_pmtiles_agg or the PMTiles Writer.

PySpark:

from databricks.labs.gbx.pyvx import functions as vx
from pyspark.sql import functions as F

vx.register(spark)

# features_df: (z, x, y, geom_wkb BINARY in tile-local coords, name STRING, id LONG)
tiles_df = (
features_df
.groupBy("z", "x", "y")
.agg(
vx.st_asmvt(
F.col("geom_wkb"),
F.struct(F.col("name"), F.col("id")),
"roads", # layer name — plain string becomes a literal Column
).alias("mvt_bytes")
)
)
# tiles_df: (z INT, x INT, y INT, mvt_bytes BINARY)

SQL (after vx.register(spark) in the same session):

SELECT
z, x, y,
gbx_st_asmvt(geom_wkb, struct(name, id), 'roads') AS mvt_bytes
FROM features_with_tile_coords
GROUP BY z, x, y

st_asmvt_pyramid

LightweightStreaming UDTF

Generator (Python UDTF) that explodes one input feature into one row per intersecting (z, x, y) tile across a zoom range, with MVT bytes already encoded in each row. The per-tile clip and coordinate transform happen inside the function — no upstream ST_Intersection is needed.

The output schema (z, x, y, mvt_bytes) is identical to the heavyweight generator, so SQL pipelines built against either tier are interchangeable. In the lightweight tier this generator is invoked only via SQL LATERAL; it has no Python DataFrame Column form (that form is heavy-only).

Signature (SQL): gbx_st_asmvt_pyramid(geom_wkb, attrs, min_z, max_z, layer_name, extent)

Parameters:

  • geom_wkb (BINARY) — Feature geometry as WKB in EPSG:4326 lon/lat. The UDTF clips each feature to every intersecting tile and transforms to tile-local coordinates internally.
  • attrs (STRUCT<...>) — Per-feature attributes. Same native-typed encoding as st_asmvt.
  • min_z, max_z (INT) — Inclusive zoom range (0..20).
  • layer_name (STRING, optional) — MVT layer name; defaults to "layer".
  • extent (INT, optional) — MVT tile extent in pixels; defaults to 4096.

Returns: One row per intersecting tile — schema (z INT, x INT, y INT, mvt_bytes BINARY).

Caps: max_z ≤ 20; total tile count across the zoom range capped at 10⁶.

SQL — LATERAL table function:

-- After vx.register(spark):
SELECT t.*
FROM features,
LATERAL gbx_st_asmvt_pyramid(
geom_wkb, struct(name, id), 0, 12, 'roads', 4096
) t

LATERAL materializes one (z, x, y, mvt_bytes) row per tile the feature intersects. Each executor processes its partition's features in parallel — tile fan-out is distributed across the cluster, not serialized on the driver.

Full pipeline — vector pyramid to PMTiles:

from databricks.labs.gbx.pyvx    import functions as vx
from databricks.labs.gbx.pmtiles import functions as px
from pyspark.sql import functions as F

vx.register(spark)
px.register(spark)

# Step 1: explode features → per-tile MVT rows (distributed; LATERAL in SQL)
tiles_df = spark.sql("""
SELECT t.*
FROM features,
LATERAL gbx_st_asmvt_pyramid(geom_wkb, struct(name, id), 0, 10, 'roads', 4096) t
""")

# Step 2: aggregate the per-tile MVT bytes → single PMTiles archive
pmt_bytes = (
tiles_df.agg(
px.pmtiles_agg(
F.col("mvt_bytes"), F.col("z"), F.col("x"), F.col("y"),
'{"name":"roads","attribution":"© My Data"}',
).alias("pmt")
)
.collect()[0]["pmt"]
)

with open("/tmp/roads.pmtiles", "wb") as fh:
fh.write(pmt_bytes)

For larger pyramids that exceed the Spark cell limit, use the PMTiles Writer (pmtiles_gbx for the lightweight tier) instead of pmtiles_agg.


Triangulation and elevation

These generators build a Delaunay triangulated irregular network (TIN) from Z-valued mass points and optional breaklines, then either expose the triangles directly or sample the surface on a regular grid to produce elevation points. Useful for surface modeling, DTM/DEM derivation, and elevation sampling from survey point clouds. All three are available in both tiers; breaklines are honored in both.

Triangulation modes: constrained vs conforming

Each TIN function takes a trailing mode argument (default 'constrained'):

ModeTiersBehavior
constrained (default)Lightweight and heavyweightConstrained Delaunay triangulation. Breaklines are honored as forced edges with no Steiner points — the output vertex set is exactly your input mass points plus breakline vertices. Identical algorithm in both tiers, so the result is a seamless cross-tier swap.
conformingHeavyweight onlyJTS conforming-Delaunay triangulation: the mesh may insert additional Steiner points along breakline segments to satisfy the Delaunay property near constraints. Produces a smoother mesh around dense breaklines at the cost of extra vertices.

The lightweight tier raises NotImplementedError on mode='conforming' — it has no Steiner-point refinement. This is a documented, intentional divergence, analogous to the H3 covering note elsewhere in these docs: the two tiers agree exactly in the default (constrained) mode, and the heavyweight tier offers conforming as a deliberate opt-in superset. If you need a result that is byte-identical across tiers, stay on constrained (the default).

Invocation surface

In the lightweight tier these TIN generators are PySpark UDTFs with no Python DataFrame Column form — invoke them via SQL LATERAL (e.g. ... , LATERAL gbx_st_triangulate(...) t), the same pattern as gbx_st_asmvt_pyramid. In the heavyweight tier they are exposed as generator Columns (usable in select(...)) and via SQL LATERAL VIEW. SQL LATERAL works for both tiers; the Python DataFrame Column form is heavyweight-only for these generators.

st_triangulate

LightweightHeavyweight Streaming UDTF

Builds a Delaunay TIN from mass-point geometries (with Z values) and optional breakline geometries, emitting one triangle polygon per row. Use this when you need the raw triangulation — e.g., to inspect mesh quality, clip triangles to an area of interest, or feed a custom sampler.

Signature: gbx_st_triangulate(points, breaklines, mergeTolerance, snapTolerance, splitPointFinder, mode)

Parameters:

  • points — Array column of point geometries with Z values (the mass points that define the surface). Accepts WKB/EWKB/WKT/EWKT.
  • breaklines — Array column of linestring geometries that the mesh must honor as edges (e.g., ridge lines, drainage channels). Pass an empty array if no breaklines are needed.
  • mergeTolerance (DOUBLE) — Distance below which coincident points are merged before triangulation.
  • snapTolerance (DOUBLE) — Distance within which points are snapped to breakline vertices.
  • splitPointFinder (STRING) — Conforming-mesh refinement strategy. Use 'NONENCROACHING' for a mesh that avoids encroaching on breakline segments; 'MIDPOINT' is also valid (heavyweight conforming mode).
  • mode (STRING, optional) — 'constrained' (default, both tiers) or 'conforming' (heavyweight only). See Triangulation modes above.

Generator: Emits one row per output triangle. Use with SQL LATERAL to materialize the triangles; the output schema column is triangle (BINARY WKB polygon).

SQL (works in both tiers after vx.register(spark)):

SELECT gbx_st_triangulate(masspoints, breaklines, 0.01, 0.01, 'NONENCROACHING', 'constrained') AS triangle FROM survey;
Example output
+--------+
|triangle|
+--------+
|[BINARY]|
+--------+

PySpark (heavyweight DataFrame Column form):

from databricks.labs.gbx.vectorx import functions as vx
from pyspark.sql import functions as F

df.select(
vx.st_triangulate(
F.col("masspoints"), F.col("breaklines"), 0.01, 0.01, "NONENCROACHING"
).alias("t")
).select("t.triangle")

In the lightweight tier, call the registered UDTF via SQL LATERAL (shown above) — there is no vx.st_triangulate(...) Column form.


st_interpolateelevationbbox

LightweightHeavyweight Streaming UDTF

Builds a TIN from mass points and breaklines, then samples elevation on a regular pixel grid covering an explicit bounding box. Use this when you already know the output extent in absolute coordinates — for example, when snapping to a fixed tile extent or aligning with a raster grid.

Signature: gbx_st_interpolateelevationbbox(points, breaklines, mergeTolerance, snapTolerance, splitPointFinder, xmin, ymin, xmax, ymax, widthPx, heightPx, srid, mode)

Parameters:

  • points — Mass-point geometries with Z values. Accepts WKB/EWKB/WKT/EWKT.
  • breaklines — Breakline geometries (or an empty array).
  • mergeTolerance (DOUBLE) — Merge distance for coincident points.
  • snapTolerance (DOUBLE) — Snap distance to breakline vertices.
  • splitPointFinder (STRING) — Conforming-mesh strategy (e.g. 'NONENCROACHING').
  • xmin, ymin, xmax, ymax (DOUBLE) — Bounding box corners in the coordinate reference system given by srid.
  • widthPx, heightPx (INT) — Number of grid columns and rows. Together with the bbox dimensions these determine the cell size.
  • srid (INT) — EPSG code of the bounding box coordinates (e.g. 27700 for British National Grid).
  • mode (STRING, optional) — 'constrained' (default, both tiers) or 'conforming' (heavyweight only).

Generator: Emits one row per in-hull grid cell (cells whose centers fall outside the TIN convex hull are dropped). The output schema column is elevation_point (BINARY WKB POINT Z). Use with SQL LATERAL to materialize the grid.

SQL (works in both tiers):

SELECT gbx_st_interpolateelevationbbox(masspoints, breaklines, 0.0, 0.01, 'NONENCROACHING', 530000, 180000, 531000, 181000, 100, 100, 27700, 'constrained') AS elev_point FROM survey;
Example output
+----------+
|elev_point|
+----------+
|[BINARY] |
+----------+

st_interpolateelevationgeom

LightweightHeavyweight Streaming UDTF

Builds a TIN from mass points and breaklines, then samples elevation on a regular grid anchored to a geometry origin with explicit cell sizes. Use this when the grid must be defined relative to a known point — for example, when the grid origin comes from data (a survey control point) or when different rows need different grid placements.

Signature: gbx_st_interpolateelevationgeom(points, breaklines, mergeTolerance, snapTolerance, splitPointFinder, gridOrigin, gridCols, gridRows, cellSizeX, cellSizeY, mode)

Parameters:

  • points — Mass-point geometries with Z values. Accepts WKB/EWKB/WKT/EWKT.
  • breaklines — Breakline geometries (or an empty array).
  • mergeTolerance (DOUBLE) — Merge distance for coincident points.
  • snapTolerance (DOUBLE) — Snap distance to breakline vertices.
  • splitPointFinder (STRING) — Conforming-mesh strategy (e.g. 'NONENCROACHING').
  • gridOrigin — POINT geometry anchoring the top-left corner of the output grid. The output SRID is inherited from this geometry (encode as EWKB/EWKT to carry a non-zero SRID) — no separate srid argument.
  • gridCols, gridRows (INT) — Number of grid columns and rows.
  • cellSizeX (DOUBLE) — Horizontal cell size in the geometry's units (positive steps right).
  • cellSizeY (DOUBLE) — Vertical cell size in the geometry's units. Pass a negative value to step downward (standard raster convention, e.g. -10.0 for 10-unit cells stepping south).
  • mode (STRING, optional) — 'constrained' (default, both tiers) or 'conforming' (heavyweight only).

Generator: Emits one row per in-hull grid cell. The output schema column is elevation_point (BINARY WKB POINT Z). Use with SQL LATERAL to materialize the grid.

SQL (works in both tiers):

SELECT gbx_st_interpolateelevationgeom(masspoints, breaklines, 0.0, 0.01, 'NONENCROACHING', ST_Point(530000, 181000), 100, 100, 10.0, -10.0, 'constrained') AS elev_point FROM survey;
Example output
+----------+
|elev_point|
+----------+
|[BINARY] |
+----------+

Legacy Mosaic conversion

st_legacyaswkb

LightweightHeavyweight

Migrates a legacy DBLabs Mosaic geometry value to standard Well-Known Binary (WKB). Pass the raw legacy geometry column through st_legacyaswkb to obtain a WKB binary that all downstream ST_* functions accept — the practical first step when moving a Mosaic-era table onto the product's native GEOMETRY/GEOGRAPHY types.

A scalar function in both tiers (same registered name, same output bytes), so the migration query is a one-line tier swap.

Parameters: legacyGeometry — Column containing a legacy Mosaic geometry value (e.g. {1, [[[x, y]]], [[]]}).

Returns: BINARY — standard WKB.

Migration notes:

  • Z values are preserved. 3D legacy geometries round-trip through st_legacyaswkb with their Z coordinate intact.
  • Polygon holes (interior rings) are preserved.
  • SRID is applied separately at ingestion. The output is plain WKB and carries no SRID; assign the CRS when you read it back, e.g. ST_GeomFromWKB(gbx_st_legacyaswkb(geom_legacy), 27700).
  • M (measure) values are out of scope for this conversion.

Common setup

Run this once before the examples below. It registers VectorX so you can use st_legacyaswkb in Python and gbx_st_legacyaswkb in SQL.

from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
vx.register(spark)
Example output
VectorX registered. You can now use st_legacyaswkb in Python and gbx_st_legacyaswkb in SQL.

Python:

from pyspark.sql import Row
from pyspark.sql.types import StructField, StructType

# Point (30, 10): typeId=1 (POINT), srid=0, boundaries=[[[30.0, 10.0]]], holes=[]
legacy_schema = _legacy_point_struct_schema()
schema = StructType([StructField("geom_legacy", legacy_schema)])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], schema)
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()
Example output
+-----------+
|wkb |
+-----------+
|[BINARY] |
+-----------+

SQL:

SELECT gbx_st_legacyaswkb(geom_legacy) AS wkb FROM legacy_table;
Example output
One row per input legacy geometry; wkb column contains binary WKB.

Quick Start example (point geometry round-trip):

# Register VectorX, create legacy point struct, convert to WKB
from databricks.labs.gbx.vectorx.jts.legacy import functions as vx
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType, IntegerType, StructField, StructType
vx.register(spark)
legacy_schema = StructType([
StructField("typeId", IntegerType()),
StructField("srid", IntegerType()),
StructField("boundaries", ArrayType(ArrayType(ArrayType(DoubleType())))),
StructField("holes", ArrayType(ArrayType(ArrayType(ArrayType(DoubleType()))))),
])
row = Row(geom_legacy=(1, 0, [[[30.0, 10.0]]], []))
shapes = spark.createDataFrame([row], StructType([StructField("geom_legacy", legacy_schema)]))
shapes.select(vx.st_legacyaswkb("geom_legacy").alias("wkb")).show()
Example output
+-----------+
|wkb |
+-----------+
|[BINARY] |
+-----------+

Next Steps