Skip to contents

Edit the configuration of a cluster to match the provided attributes and size.

Usage

db_cluster_edit(
  cluster_id,
  spark_version,
  node_type_id,
  num_workers = NULL,
  autoscale = NULL,
  name = NULL,
  spark_conf = NULL,
  cloud_attrs = NULL,
  driver_node_type_id = NULL,
  custom_tags = NULL,
  init_scripts = NULL,
  spark_env_vars = NULL,
  autotermination_minutes = NULL,
  log_conf = NULL,
  ssh_public_keys = NULL,
  driver_instance_pool_id = NULL,
  instance_pool_id = NULL,
  idempotency_token = NULL,
  enable_elastic_disk = NULL,
  apply_policy_default_values = NULL,
  enable_local_disk_encryption = NULL,
  docker_image = NULL,
  policy_id = NULL,
  host = db_host(),
  token = db_token(),
  perform_request = TRUE
)

Arguments

cluster_id

Canonical identifier for the cluster.

spark_version

The runtime version of the cluster. You can retrieve a list of available runtime versions by using db_cluster_runtime_versions().

node_type_id

The node type for the worker nodes. db_cluster_list_node_types() can be used to see available node types.

num_workers

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

autoscale

Instance of cluster_autoscale().

name

Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

spark_conf

Named list. An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively. E.g. list("spark.speculation" = true, "spark.streaming.ui.retainedBatches" = 5).

cloud_attrs

Attributes related to clusters running on specific cloud provider. Defaults to aws_attributes(). Must be one of aws_attributes(), azure_attributes(), gcp_attributes().

driver_node_type_id

The node type of the Spark driver. This field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. db_cluster_list_node_types() can be used to see available node types.

custom_tags

Named list. An object containing a set of tags for cluster resources. Databricks tags all cluster resources with these tags in addition to default_tags. Databricks allows at most 45 custom tags.

init_scripts

Instance of init_script_info().

spark_env_vars

Named list. User-specified environment variable key-value pairs. In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default Databricks managed environmental variables are included as well. E.g. {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

autotermination_minutes

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 120.

log_conf

Instance of cluster_log_conf().

ssh_public_keys

List. SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

driver_instance_pool_id

ID of the instance pool to use for the driver node. You must also specify instance_pool_id. Optional.

instance_pool_id

ID of the instance pool to use for cluster nodes. If driver_instance_pool_id is present, instance_pool_id is used for worker nodes only. Otherwise, it is used for both the driver and worker nodes. Optional.

idempotency_token

An optional token that can be used to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the ID of the existing cluster instead. The existence of a cluster with the same token is not checked against terminated clusters. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one cluster will be launched with that idempotency token. This token should have at most 64 characters.

enable_elastic_disk

When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.

apply_policy_default_values

Boolean (Default: TRUE), whether to use policy default values for missing cluster attributes.

enable_local_disk_encryption

Boolean (Default: TRUE), whether encryption of disks locally attached to the cluster is enabled.

docker_image

Instance of docker_image().

policy_id

String, ID of a cluster policy.

host

Databricks workspace URL, defaults to calling db_host().

token

Databricks workspace token, defaults to calling db_token().

perform_request

If TRUE (default) the request is performed, if FALSE the httr2 request is returned without being performed.

Details

You can edit a cluster if it is in a RUNNING or TERMINATED state. If you edit a cluster while it is in a RUNNING state, it will be restarted so that the new attributes can take effect. If you edit a cluster while it is in a TERMINATED state, it will remain TERMINATED. The next time it is started using the clusters/start API, the new attributes will take effect. An attempt to edit a cluster in any other state will be rejected with an INVALID_STATE error code.

Clusters created by the Databricks Jobs service cannot be edited.