Skip to contents

Defining Credentials

The brickster package connects to a Databricks workspace is two ways:

  1. OAuth user-to-machine (U2M) authentication
  2. Personal Access Tokens (PAT)

It’s recommended to use option (1) when using brickster interactively, if you need to run code via an automated process the only option currently is (2).

Personal Access Tokens can be generated in a few steps, for a step-by-step breakdown refer to the documentation.

Once you have a token you’ll be able to store it alongside the workspace URL in an .Renviron file. The .Renviron is used for storing the variables, such as those which may be sensitive (e.g. credentials) and de-couple them from the code (additional reading: 1, 2).

To get started add the following to your .Renviron:

  • DATABRICKS_HOST: The workspace URL

  • DATABRICKS_TOKEN: Personal access token (not required if using OAuth U2M)

  • DATABRICKS_WSID: The workspace ID (docs)

DATABRICKS_WSID is only required for the RStudio IDE integration with the connection pane.

Example of entries in .Renviron:

DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi123456789012345678a9bc01234defg5
DATABRICKS_WSID=123123123123123

Note: Recommend creating an .Renviron for each project. You can create .Renviron within your user home directory if required.

Restarting your R session will allow those variable to be picked up via the brickster package.

Using Credentials with {brickster}

Authentication should now be possible without specifying the credentials in your R code. You can load brickster and list the clusters within the workspace using db_cluster_list(), to access the host/token use db_host()/db_token() respectively.

library(brickster)

# using db_host() and db_token() to get credentials
clusters <- db_cluster_list(host = db_host(), token = db_token())

All brickster functions have their host/token parameters default to calling db_host()/db_token() therefore we can omit explicit calls to the functions.

# all host/token parameters default to db_host()/db_token()
clusters <- db_cluster_list()

When using OAuth U2M authentication you don’t define a token in .Renviron and therefore db_token() will return NULL.

Managing Multiple Credentials

There are two methods that brickster supports to simplify switching of credentials within an R project/session:

  1. Adding multiple credentials to .Renviron, each additional set of credentials is differentiated via a suffix (e.g. DATABRICKS_TOKEN_DEV)
  2. Using a .databrickscfg file (primary method in Databricks CLI)

To differentiate between (1) and (2) the option use_databrickscfg is used, the following example shows how to switch the session to use .databrickscfg.

# will use the `DEFAULT` profile in `.databrickscfg`
options(use_databrickscfg = TRUE)

# values returned should be those in profile of `.databrickscfg`
db_host()
db_token()

The default behaviour is to read credentials from .Renviron. If you wish to change this it’s recommended to set the option within .Rprofile so that it’s set during initialization of the R session.

Switching Between Credentials

The db_profile option controls which profiles credentials are returned by db_host()/db_token()/db_wsid().

Profiles enable you to switch contexts between:

  • Different workspaces (e.g. development or production)

  • Different permissions (e.g. admin or restricted user)

This behaviour works when using credentials specified in either .Renviron or .databrickscfg:

# using .Renviron
db_host() # returns `DB_HOST` (.Renviron)

# switch profile to 'prod'
options(db_profile = "prod")
db_host() # returns `DB_HOST_PROD` (.Renviron)

# set back to default (NULL)
options(db_profile = NULL)
# use .databrickcfg
options(use_databrickscfg = TRUE)
db_host() # returns host from `DEFAULT` profile (.databrickscfg)

options(db_profile = "prod")
db_host() # returns host from `prod` profile in (.datarickscfg)

It is expected that profiles in .Renviron will adhere to the same naming convention as default but add an additional suffix.

Here is an example of an .Renviron file that has three profiles (default, dev, prod):

# default
DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID=123123123123123
# dev
DATABRICKS_HOST_DEV=xxxxxxx-dev.cloud.databricks.com
DATABRICKS_TOKEN_DEV=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_DEV=123123123123124
# prod
DATABRICKS_HOST_PROD=xxxxxxx-prod.cloud.databricks.com
DATABRICKS_TOKEN_PROD=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_PROD=123123123123125

Configuring .databrickscfg

For details on configuring please refer to documentation from Databricks CLI.

There is only one brickster specific feature and it is the inclusion of wsid alongside host/token.

wsid is used by the connections pane integration in RStudio as the underlying API’s require it.