Additional
Generate Synthetic Data
For lightweight testing without downloads, create synthetic data directly in your notebook:
Synthetic Points (Vector)
This example uses built-in st_point and st_astext (Databricks Spatial SQL), which are only available on Databricks Runtime (DBR) 17.1+. The code and tests live under docs/tests-dbr/ and will skip when run on open-source Spark.
# Generate synthetic point data
from pyspark.sql import functions as f
import random
# Create 1000 random points in London area
random.seed(42)
points = spark.range(1000).select(
f.col("id"),
(f.lit(51.5) + (f.rand() - 0.5) * 0.5).alias("latitude"), # London area
(f.lit(-0.1) + (f.rand() - 0.5) * 0.5).alias("longitude"),
(f.rand() * 100).cast("int").alias("value")
)
# Add WKT geometry (st_point, st_astext require DBR)
points = points.withColumn(
"geom",
f.expr("st_astext(st_point(longitude, latitude))")
)
# Save as table or use directly
points.write.format("delta").mode("overwrite").saveAsTable(f"{CATALOG}.{SCHEMA}.synthetic_points")
print(f"✅ Created {points.count()} synthetic points")
points.show(5)
✅ Created 1000 synthetic points
+---+--------+---------+-----+
|id |latitude|longitude|value|
+---+--------+---------+-----+
|0 |51.23 |-0.15 |42 |
|1 |51.67 |0.08 |91 |
|...|... |... |... |
+---+--------+---------+-----+
Synthetic Raster (Small Test File)
# Generate a small synthetic raster using GDAL
from osgeo import gdal, osr
import numpy as np
from pathlib import Path
sample_path = "/Volumes/main/default/geobrix_samples/geobrix-examples"
# Output directory and file
output_dir = Path(f"{sample_path}/synthetic-raster")
output_dir.mkdir(parents=True, exist_ok=True)
output_file = output_dir / "synthetic_100x100.tif"
# Create 100x100 raster with random values
width, height = 100, 100
data = np.random.randint(0, 255, (height, width), dtype=np.uint8)
# Create GeoTIFF
driver = gdal.GetDriverByName('GTiff')
dataset = driver.Create(str(output_file), width, height, 1, gdal.GDT_Byte)
# Set geotransform (top-left corner at 0,0, pixel size 1x1)
dataset.SetGeoTransform([0, 1, 0, 0, 0, -1])
# Set projection (WGS84)
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
dataset.SetProjection(srs.ExportToWkt())
# Write data
band = dataset.GetRasterBand(1)
band.WriteArray(data)
band.SetNoDataValue(0)
# Close and flush
band.FlushCache()
dataset = None
print(f"✅ Created synthetic 100x100 raster")
print(f" Path: {output_file}")
# Verify
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)
synthetic = spark.read.format("gdal").load(str(output_file))
synthetic.select(
rx.rst_width("tile"),
rx.rst_height("tile"),
rx.rst_min("tile"),
rx.rst_max("tile")
).show()
✅ Created synthetic 100x100 raster
Path: .../synthetic-raster/synthetic_100x100.tif
+-----+------+----+----+
|width|height|min |max |
+-----+------+----+----+
|100 |100 |0 |254 |
+-----+------+----+----+
Alternative Data Sources
NYC Open Data
Explore 100+ additional NYC datasets at https://data.cityofnewyork.us/
Popular datasets:
- NYC Parks:
https://data.cityofnewyork.us/api/geospatial/enfh-gkve?method=export&format=GeoJSON - Street Centerlines: Vector road network
- Land Use: Zoning and property data
- 311 Service Requests: Point data with coordinates
- Building Footprints: All NYC buildings
London Open Data
Explore London datasets at https://data.london.gov.uk/
Popular datasets:
- Transport networks: Underground, bus routes
- Crime statistics: With geographic data
- Green spaces: Parks and nature reserves
More Sentinel-2 Imagery via STAC
Access global Sentinel-2 data:
# Any location worldwide
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
# Search by your area of interest
my_bbox = [west, south, east, north] # Your coordinates
search = catalog.search(
collections=["sentinel-2-l2a"],
bbox=my_bbox,
datetime="2023-01-01/2023-12-31",
query={"eo:cloud_cover": {"lt": 20}}
)
Other STAC Catalogs
-
Earth Search: https://earth-search.aws.element84.com/v1
- Sentinel-2, Landsat on AWS
-
USGS STAC: https://landsatlook.usgs.gov/stac-server
- Landsat collection
-
Copernicus Data Space: https://dataspace.copernicus.eu/
- European satellite data
Global Elevation Data
-
SRTM 30m: https://dwtkns.com/srtm30m/
- Global coverage (60°N to 56°S)
- 30m resolution
-
Copernicus DEM: https://portal.opentopography.org/
- Higher resolution for Europe
- 30m global
US Data
-
US Census TIGER: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
- All US administrative boundaries
- Roads, water features, census tracts
-
USGS National Map: https://www.usgs.gov/programs/national-geospatial-program/national-map
- Elevation, land cover, imagery
UK Data
-
Ordnance Survey Open Data: https://www.ordnancesurvey.co.uk/business-government/products/open-map-data
- UK administrative boundaries
- Topographic data
-
Environment Agency: https://environment.data.gov.uk/
- Flood zones, elevation, land cover