R Toolkit for Databricks • brickster

Overview

brickster is the R toolkit for Databricks, it includes:

Wrappers for Databricks API’s (e.g. db_cluster_list, db_volume_read)
Browser workspace assets via RStudio Connections Pane (open_workspace())
DBI + dbplyr backend (no more ODBC installs!)
Interactive Databricks REPL

Quick Start

library(brickster)

# only requires `DATABRICKS_HOST` if using OAuth U2M
# first request will open browser window to login
Sys.setenv(DATABRICKS_HOST = "https://<workspace-prefix>.cloud.databricks.com")

# open RStudio/Positron connection pane to view Databricks resources
open_workspace()

# list all SQL warehouses
warehouses <- db_sql_warehouse_list()

Refer to the “Connect to a Databricks Workspace” article for more details on getting authentication configured.

Usage

`{DBI}` Backend

library(brickster)
library(DBI)

# Connect to Databricks using DBI (assumes you followed quickstart to authenticate)
con <- dbConnect(
  DatabricksSQL(),
  warehouse_id = "<warehouse-id>"
)

# Standard {DBI} operations
tables <- dbListTables(con)
dbGetQuery(con, "SELECT * FROM samples.nyctaxi.trips LIMIT 5")

# Use with {dbplyr} for {dplyr} syntax
library(dplyr)
library(dbplyr)

nyc_taxi <- tbl(con, I("samples.nyctaxi.trips"))

result <- nyc_taxi |>
  filter(year(tpep_pickup_datetime) == 2016) |>
  group_by(pickup_zip) |>
  summarise(
    trip_count = n(),
    avg_fare = mean(fare_amount, na.rm = TRUE),
    avg_distance = mean(trip_distance, na.rm = TRUE)
  ) |>
  collect()

Download & Upload to Volume

library(readr)
library(brickster)

# upload `data.csv` to a volume
local_file <- tempfile(fileext = ".csv")
write_csv(x = iris, file = local_file)
db_volume_write(
  path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
  file = local_file
)

# read `data.csv` from a volume and write to a file
downloaded_file <- tempfile(fileext = ".csv")
file <- db_volume_read(
  path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
  destination = downloaded_file
)
volume_csv <- read_csv(downloaded_file)

Databricks REPL

Run commands against an existing interactive Databricks cluster, read this article for more details.

library(brickster)

# commands after this will run on the interactive cluster
# read the vignette for more details
db_repl(cluster_id = "<interactive_cluster_id>")

Installation

install.packages("brickster")

Development Version

# install.packages("pak")
pak::pak("databrickslabs/brickster")

API Coverage

brickster is very deliberate with choosing what API’s are wrapped. brickster isn’t intended to replace IaC tooling (e.g. Terraform) or to be used for account/workspace administration.

API	Available	Version
DBFS	Yes	2.0
Secrets	Yes	2.0
Repos	Yes	2.0
mlflow Model Registry	Yes	2.0
Clusters	Yes	2.0
Libraries	Yes	2.0
Workspace	Yes	2.0
Endpoints	Yes	2.0
Query History	Yes	2.0
Jobs	Yes	2.1
Volumes (Files)	Yes	2.0
SQL Statement Execution	Yes	2.0
REST 1.2 Commands	Partially	1.2
Unity Catalog - Tables	Yes	2.1
Unity Catalog - Volumes	Yes	2.1
Unity Catalog	Partially	2.1

brickster