Skip to main content

Additional

Generate Synthetic Data

For lightweight testing without downloads, create synthetic data directly in your notebook:

Synthetic Points (Vector)

Databricks Runtime only

This example uses built-in st_point and st_astext (Databricks Spatial SQL), which are only available on Databricks Runtime (DBR) 17.1+. The code and tests live under docs/tests-dbr/ and will skip when run on open-source Spark.

# Generate synthetic point data
from pyspark.sql import functions as f
import random

# Create 1000 random points in London area
random.seed(42)
points = spark.range(1000).select(
f.col("id"),
(f.lit(51.5) + (f.rand() - 0.5) * 0.5).alias("latitude"), # London area
(f.lit(-0.1) + (f.rand() - 0.5) * 0.5).alias("longitude"),
(f.rand() * 100).cast("int").alias("value")
)

# Add WKT geometry (st_point, st_astext require DBR)
points = points.withColumn(
"geom",
f.expr("st_astext(st_point(longitude, latitude))")
)

# Save as table or use directly
points.write.format("delta").mode("overwrite").saveAsTable(f"{CATALOG}.{SCHEMA}.synthetic_points")
print(f"✅ Created {points.count()} synthetic points")
points.show(5)
Example output
✅ Created 1000 synthetic points
+---+--------+---------+-----+
|id |latitude|longitude|value|
+---+--------+---------+-----+
|0 |51.23 |-0.15 |42 |
|1 |51.67 |0.08 |91 |
|...|... |... |... |
+---+--------+---------+-----+

Synthetic Raster (Small Test File)

# Generate a small synthetic raster using GDAL
from osgeo import gdal, osr
import numpy as np
from pathlib import Path

sample_path = "/Volumes/main/default/geobrix_samples/geobrix-examples"
# Output directory and file
output_dir = Path(f"{sample_path}/synthetic-raster")
output_dir.mkdir(parents=True, exist_ok=True)
output_file = output_dir / "synthetic_100x100.tif"

# Create 100x100 raster with random values
width, height = 100, 100
data = np.random.randint(0, 255, (height, width), dtype=np.uint8)

# Create GeoTIFF
driver = gdal.GetDriverByName('GTiff')
dataset = driver.Create(str(output_file), width, height, 1, gdal.GDT_Byte)

# Set geotransform (top-left corner at 0,0, pixel size 1x1)
dataset.SetGeoTransform([0, 1, 0, 0, 0, -1])

# Set projection (WGS84)
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
dataset.SetProjection(srs.ExportToWkt())

# Write data
band = dataset.GetRasterBand(1)
band.WriteArray(data)
band.SetNoDataValue(0)

# Close and flush
band.FlushCache()
dataset = None

print(f"✅ Created synthetic 100x100 raster")
print(f" Path: {output_file}")

# Verify
from databricks.labs.gbx.rasterx import functions as rx
rx.register(spark)

synthetic = spark.read.format("gdal").load(str(output_file))
synthetic.select(
rx.rst_width("tile"),
rx.rst_height("tile"),
rx.rst_min("tile"),
rx.rst_max("tile")
).show()
Example output
✅ Created synthetic 100x100 raster
Path: .../synthetic-raster/synthetic_100x100.tif
+-----+------+----+----+
|width|height|min |max |
+-----+------+----+----+
|100 |100 |0 |254 |
+-----+------+----+----+

Alternative Data Sources

NYC Open Data

Explore 100+ additional NYC datasets at https://data.cityofnewyork.us/

Popular datasets:

  • NYC Parks: https://data.cityofnewyork.us/api/geospatial/enfh-gkve?method=export&format=GeoJSON
  • Street Centerlines: Vector road network
  • Land Use: Zoning and property data
  • 311 Service Requests: Point data with coordinates
  • Building Footprints: All NYC buildings

London Open Data

Explore London datasets at https://data.london.gov.uk/

Popular datasets:

  • Transport networks: Underground, bus routes
  • Crime statistics: With geographic data
  • Green spaces: Parks and nature reserves

More Sentinel-2 Imagery via STAC

Access global Sentinel-2 data:

# Any location worldwide
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)

# Search by your area of interest
my_bbox = [west, south, east, north] # Your coordinates
search = catalog.search(
collections=["sentinel-2-l2a"],
bbox=my_bbox,
datetime="2023-01-01/2023-12-31",
query={"eo:cloud_cover": {"lt": 20}}
)

Other STAC Catalogs

Global Elevation Data

US Data

UK Data