Developers
This page is for contributors and developers working in the GeoBrix repository. It describes how the project is organized and how to use the Cursor integration (rules, commands, agents, and skills) effectively.
How the project is organized
GeoBrix is a multi-artifact repo: Scala/JVM core, Python bindings, docs, and tooling share the same root and are wired for Databricks and local development.
Repository layout
| Path | Purpose |
|---|---|
src/main/scala/com/databricks/labs/gbx/ | Core implementation: readers, expressions, RasterX, GridX, VectorX |
src/test/scala/ | Scala unit and integration tests |
python/geobrix/ | Python package: PySpark bindings and sample-data bundle |
docs/ | Docusaurus site: docs/ (content), src/ (components), tests under docs/tests/ |
notebooks/ | Sample notebooks (e.g. sample-data/setup_sample_data.ipynb) and notebooks/tests/ |
scripts/ | CI, Docker, and one-off scripts |
sample-data/ | Scripts and outputs for sample data (host); in-cluster uses Volumes path |
.cursor/ | Cursor integration: rules, commands, agents, skills (see below) |
Packages and readers
- RasterX — Raster operations and expressions (GDAL-backed);
rst_*/gbx_rst_*. - GridX — Grid systems (BNG, H3);
bng_*/gbx_bng_*. - VectorX — Vector geometry and OGR-backed readers;
st_*/gbx_st_*. - Readers — Format-specific data sources (GDAL, OGR, GeoTIFF, Shapefile, GeoJSON, GeoPackage, etc.) registered as Spark data sources.
Tests and docs
- Unit tests:
src/test/scala/(Scala),python/geobrix/test/(Python). - Documentation tests:
docs/tests/python/,docs/tests/scala/— validate code examples used in the docs; single source of truth. - Notebook tests:
notebooks/tests/mirrorsnotebooks/; run via Cursor commands or CI.
Development and CI use a Docker image (geobrix-dev) for a consistent environment; many Cursor commands run inside that container.
Git LFS — required to clone the GDAL platform tarball
The GDAL platform tarball at resources/static/geobrix-gdal-platform-noble.tar.gz (~90 MB, ships in every GeoBrix release as the runtime GDAL bundle) is stored via Git LFS so the binary lives in LFS storage instead of the git pack. The matching .sha256 sidecar is small enough to live in git directly and is NOT LFS-tracked. The tracking rule is in .gitattributes at the repo root.
One-time install per machine
brew install git-lfs # macOS; or apt-get install git-lfs on Debian/Ubuntu
git lfs install # writes LFS filters into ~/.gitconfig
Cloning the repo
After git lfs install, a normal git clone of geobrix automatically fetches LFS objects:
git clone git@github.com:databrickslabs/geobrix.git
If you cloned before installing git-lfs, run git lfs pull from inside the working tree to fetch the binary. Without that step, resources/static/geobrix-gdal-platform-noble.tar.gz will be a ~130-byte LFS pointer file rather than the real 90 MB tarball, and the package-geobrix-artifacts.yml workflow's lfs: true checkout will fail an integrity check.
Updating the platform tarball
Rebuild only when GDAL_PPA_VERSION changes, when DBR moves to a new Ubuntu LTS, or for a security advisory against one of the bundled libs. See resources/static/README.md for the full Docker-based recipe. The short version:
- Run
scripts/build-gdal-artifacts.sh --platform-onlyinside a freshubuntu:24.04container (Docker recipe in the README). - Move the resulting
geobrix-gdal-platform-noble.tar.gz+.sha256fromdist/intoresources/static/. git add resources/static/geobrix-gdal-platform-noble.tar.gz— the LFS filter intercepts via.gitattributes. Verify withgit lfs ls-files(should list the tarball) andgit diff --cached --stat resources/static/geobrix-gdal-platform-noble.tar.gz(should show ~3 lines added — the pointer — not 90 MB).git add resources/static/geobrix-gdal-platform-noble.tar.gz.sha256— committed normally, not LFS.- Open a PR. The reviewer re-runs the build script locally in their own
ubuntu:24.04container and confirms the resulting sha256 matches the committed sidecar before approving — that PR review is the trust anchor for every cluster that subsequently installs from this bundle. See Security for the full chain.
Storage considerations
LFS bandwidth and storage come from the databrickslabs GitHub org quota. Each tarball bump consumes both. Don't rebuild the tarball just to bump GeoBrix versions — the release workflow grafts the per-release JAR onto the committed platform tarball without changing it.
Testing on a Databricks cluster
You can run the Essential bundle and primitive Volume tests on a live Databricks cluster so that Volume paths are FUSE-mounted and the bundle uses pathlib/shutil only (no Databricks Files API).
Config — Copy notebooks/tests/databricks_cluster_config.example.env to notebooks/tests/databricks_cluster_config.env and set:
DATABRICKS_HOST,DATABRICKS_TOKEN(orDATABRICKS_CONFIG_PROFILE)CLUSTER_ID(existing cluster to run the job)GBX_BUNDLE_VOLUME_CATALOG,GBX_BUNDLE_VOLUME_SCHEMA,GBX_BUNDLE_VOLUME_NAME— Volume root is/Volumes/<catalog>/<schema>/<volume_name>. The volume name must match Data Explorer exactly (e.g.sample-datanotsample_data).- GBX_ARTIFACT_VOLUME — directory for artifacts (e.g.
/Volumes/.../artifacts). JAR and wheel are uploaded directly here (no subpaths). Wheel path for the notebook is derived asGBX_ARTIFACT_VOLUME/geobrix-<version>-py3-none-any.whlunless overridden. - Optional:
GBX_BUNDLE_WHEEL_VOLUME_PATH— override full wheel path for the notebook pip cells. - Optional:
GBX_BUNDLE_SKIP_WHEEL_UPLOAD=1— use existing wheel (no build/upload); notebook still gets the pip and restart cells. - Optional:
GBX_BUNDLE_SKIP_JAR_UPLOAD=1— when running push-wheel, skip JAR build/upload; when running push-jar alone, skip JAR build/upload. - Optional:
GBX_RUNNER_DIR,GBX_BUNDLE_RUNNER_NOTEBOOK,GBX_PRIMITIVE_RUNNER_NOTEBOOK— where to upload the runner notebooks.
Commands — From the repo root:
gbx:test:primitive-databricks— Pushes the primitive notebook and runs it on the cluster. Validates volume exists, create subdirs, read/write/copy via FUSE (pathlib/shutil). No GeoBrix dependency.gbx:test:bundle-databricks— Pushes the bundle runner notebook and runs it on the cluster. IfGBX_BUNDLE_WHEEL_VOLUME_PATHis set, the notebook has: (1)%pip install --quiet <wheel>, (2)%pip install --quiet --no-deps --force-reinstall <wheel>, (3)dbutils.library.restartPython(), then the bundle cell. Run those cells in order so the restarted process loads the new GeoBrix code.
Rule — For Volume path handling (FUSE, pathlib, no random access), see .cursor/rules/unity-catalog-volumes.mdc.