Overview
brickster is the R toolkit for Databricks, it includes:
Wrappers for Databricks API’s (e.g.
db_cluster_list
,db_volume_read
)Browser workspace assets via RStudio Connections Pane (
open_workspace()
)Interactive Databricks REPL
Quick Start
library(brickster)
# only requires `DATABRICKS_HOST` if using OAuth U2M
# first request will open browser window to login
Sys.setenv(DATABRICKS_HOST = "https://<workspace-prefix>.cloud.databricks.com")
# open RStudio/Positron connection pane to view Databricks resources
open_workspace()
# list all SQL warehouses
warehouses <- db_sql_warehouse_list()
Refer to the “Connect to a Databricks Workspace” article for more details on getting authentication configured.
Usage
{DBI}
Backend
library(brickster)
library(DBI)
# Connect to Databricks using DBI (assumes you followed quickstart to authenticate)
con <- dbConnect(
DatabricksSQL(),
warehouse_id = "<warehouse-id>"
)
# Standard {DBI} operations
tables <- dbListTables(con)
dbGetQuery(con, "SELECT * FROM samples.nyctaxi.trips LIMIT 5")
# Use with {dbplyr} for {dplyr} syntax
library(dplyr)
library(dbplyr)
nyc_taxi <- tbl(con, I("samples.nyctaxi.trips"))
result <- nyc_taxi |>
filter(year(tpep_pickup_datetime) == 2016) |>
group_by(pickup_zip) |>
summarise(
trip_count = n(),
avg_fare = mean(fare_amount, na.rm = TRUE),
avg_distance = mean(trip_distance, na.rm = TRUE)
) |>
collect()
Download & Upload to Volume
library(readr)
library(brickster)
# upload `data.csv` to a volume
local_file <- tempfile(fileext = ".csv")
write_csv(x = iris, file = local_file)
db_volume_write(
path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
file = local_file
)
# read `data.csv` from a volume and write to a file
downloaded_file <- tempfile(fileext = ".csv")
file <- db_volume_read(
path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
destination = downloaded_file
)
volume_csv <- read_csv(downloaded_file)
Installation
install.packages("brickster")
Development Version
# install.packages("pak")
pak::pak("databrickslabs/brickster")
API Coverage
brickster is very deliberate with choosing what API’s are wrapped. brickster isn’t intended to replace IaC tooling (e.g. Terraform) or to be used for account/workspace administration.
API | Available | Version |
---|---|---|
DBFS | Yes | 2.0 |
Secrets | Yes | 2.0 |
Repos | Yes | 2.0 |
mlflow Model Registry | Yes | 2.0 |
Clusters | Yes | 2.0 |
Libraries | Yes | 2.0 |
Workspace | Yes | 2.0 |
Endpoints | Yes | 2.0 |
Query History | Yes | 2.0 |
Jobs | Yes | 2.1 |
Volumes (Files) | Yes | 2.0 |
SQL Statement Execution | Yes | 2.0 |
REST 1.2 Commands | Partially | 1.2 |
Unity Catalog - Tables | Yes | 2.1 |
Unity Catalog - Volumes | Yes | 2.1 |
Unity Catalog | Partially | 2.1 |