Authentication
It’s recommended that you authenticate via the .Renviron
file using DATABRICKS_HOST
and DATABRICKS_TOKEN
environment variables. You can also use Databricks CLI Configuration Profiles and DATABRICKS_CONFIG_FILE
or DATABRICKS_CONFIG_PROFILE
environment variables, but only the PAT Authentication works at the moment. If you need more authentication methods, please fork this GitHub repository and send pull request with the feature suggestion.
Example of overriding authentication profile. Look at databricks auth profiles
to know which ones are working.
client <- DatabricksClient(profile="your-cli-profile")
Complete with Databricks workspace-level APIs
Databricks SDK for R comes with all public workspace-level API and is consistent with Databricks SDK for Python, Databricks SDK for Go, and Databricks SDK for Java. Databricks SDK for R does not expose account-level API and you’re recommended to use Go, Python, or Java SDK to build an account-level automation.
library(dplyr)
library(databricks)
client <- DatabricksClient()
running <- list_clusters(client) %>% filter(state == 'RUNNING')
context <- create_command_execution_and_wait(client, cluster_id=running$cluster_id, language='python')
res <- execute_command_and_wait(client, cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')
res
Pagination
All list
methods (and those, which return any list of results), do consistently return a data.frame
of all entries from all pages, regardless of the underlying implementation.
list_clusters(client)[1:10,c("cluster_id", "cluster_name", "state")]
# cluster_id cluster_name state
# 1 1109-110110-kjfoeopq DEFAULT Test Cluster TERMINATED
# 2 0110-221212-oqqpodoa GO_SDK Test Cluster TERMINATED
# 3 1109-012301-qlwlwqpq BRICKS Test Cluster TERMINATED
# 4 1109-110012-qpwoepqq VSCODE Test Cluster TERMINATED
# 5 0110-201022-oqooqpqp JS_SDK Test Cluster TERMINATED
Long-running operations
All long-running operations do poll Databricks backend until the entity reaches desired state:
create_cluster_and_wait(client, spark_version = "12.x-snapshot-scala2.12", cluster_name = "r-sdk-cluster", num_workers = 1, autotermination_minutes=20, node_type_id="i3.xlarge")
# PENDING: Finding instances for new nodes, acquiring more instances if necessary
Interface stability
API clients for all services are generated from specification files that are synchronized from the main platform. Databricks may have minor documented backward-incompatible changes, such as renaming the methods or some type names to bring more consistency.
Project Support
Please note that all projects in the databrickslabs
github space are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.