Skip to contents

Overview

brickster is the R toolkit for Databricks, it includes:

Quick Start

library(brickster)

# only requires `DATABRICKS_HOST` if using OAuth U2M
# first request will open browser window to login
Sys.setenv(DATABRICKS_HOST = "https://<workspace-prefix>.cloud.databricks.com")

# open RStudio/Positron connection pane to view Databricks resources
open_workspace()

# list all SQL warehouses
warehouses <- db_sql_warehouse_list()

Refer to the “Connect to a Databricks Workspace” article for more details on getting authentication configured.

Usage

{DBI} Backend

library(brickster)
library(DBI)

# Connect to Databricks using DBI (assumes you followed quickstart to authenticate)
con <- dbConnect(
  DatabricksSQL(),
  warehouse_id = "<warehouse-id>"
)

# Standard {DBI} operations
tables <- dbListTables(con)
dbGetQuery(con, "SELECT * FROM samples.nyctaxi.trips LIMIT 5")

# Use with {dbplyr} for {dplyr} syntax
library(dplyr)
library(dbplyr)

nyc_taxi <- tbl(con, I("samples.nyctaxi.trips"))

result <- nyc_taxi |>
  filter(year(tpep_pickup_datetime) == 2016) |>
  group_by(pickup_zip) |>
  summarise(
    trip_count = n(),
    avg_fare = mean(fare_amount, na.rm = TRUE),
    avg_distance = mean(trip_distance, na.rm = TRUE)
  ) |>
  collect()

Download & Upload to Volume

library(readr)
library(brickster)

# upload `data.csv` to a volume
local_file <- tempfile(fileext = ".csv")
write_csv(x = iris, file = local_file)
db_volume_write(
  path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
  file = local_file
)

# read `data.csv` from a volume and write to a file
downloaded_file <- tempfile(fileext = ".csv")
file <- db_volume_read(
  path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
  destination = downloaded_file
)
volume_csv <- read_csv(downloaded_file)

Databricks REPL

Run commands against an existing interactive Databricks cluster, read this article for more details.

library(brickster)

# commands after this will run on the interactive cluster
# read the vignette for more details
db_repl(cluster_id = "<interactive_cluster_id>")

Installation

install.packages("brickster")

Development Version

# install.packages("pak")
pak::pak("databrickslabs/brickster")

API Coverage

brickster is very deliberate with choosing what API’s are wrapped. brickster isn’t intended to replace IaC tooling (e.g. Terraform) or to be used for account/workspace administration.

API Available Version
DBFS Yes 2.0
Secrets Yes 2.0
Repos Yes 2.0
mlflow Model Registry Yes 2.0
Clusters Yes 2.0
Libraries Yes 2.0
Workspace Yes 2.0
Endpoints Yes 2.0
Query History Yes 2.0
Jobs Yes 2.1
Volumes (Files) Yes 2.0
SQL Statement Execution Yes 2.0
REST 1.2 Commands Partially 1.2
Unity Catalog - Tables Yes 2.1
Unity Catalog - Volumes Yes 2.1
Unity Catalog Partially 2.1