Skip to contents

This is a long-running operation, which blocks until Clusters on Databricks reach RUNNING state with the timeout of 20 minutes, that you can change via timeout parameter. By default, the state of Databricks Clusters is reported to console. You can change this behavior by changing the callback parameter.

Usage

create_cluster_and_wait(
  client,
  spark_version,
  apply_policy_default_values = NULL,
  autoscale = NULL,
  autotermination_minutes = NULL,
  aws_attributes = NULL,
  azure_attributes = NULL,
  cluster_log_conf = NULL,
  cluster_name = NULL,
  cluster_source = NULL,
  custom_tags = NULL,
  data_security_mode = NULL,
  docker_image = NULL,
  driver_instance_pool_id = NULL,
  driver_node_type_id = NULL,
  enable_elastic_disk = NULL,
  enable_local_disk_encryption = NULL,
  gcp_attributes = NULL,
  init_scripts = NULL,
  instance_pool_id = NULL,
  node_type_id = NULL,
  num_workers = NULL,
  policy_id = NULL,
  runtime_engine = NULL,
  single_user_name = NULL,
  spark_conf = NULL,
  spark_env_vars = NULL,
  ssh_public_keys = NULL,
  workload_type = NULL,
  timeout = 20,
  callback = cli_reporter
)

Arguments

client

Required. Instance of DatabricksClient()

spark_version

Required. The Spark version of the cluster, e.g.

apply_policy_default_values

This field has no description yet.

autoscale

Parameters needed in order to automatically scale clusters up and down based on load.

autotermination_minutes

Automatically terminates the cluster after it is inactive for this time in minutes.

aws_attributes

Attributes related to clusters running on Amazon Web Services.

azure_attributes

Attributes related to clusters running on Microsoft Azure.

cluster_log_conf

The configuration for delivering spark logs to a long-term storage destination.

cluster_name

Cluster name requested by the user.

cluster_source

Determines whether the cluster was created by a user through the UI, created by the Databricks Jobs Scheduler, or through an API request.

custom_tags

Additional tags for cluster resources.

data_security_mode

Data security mode decides what data governance model to use when accessing data from a cluster.

docker_image

This field has no description yet.

driver_instance_pool_id

The optional ID of the instance pool for the driver of the cluster belongs.

driver_node_type_id

The node type of the Spark driver.

enable_elastic_disk

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.

enable_local_disk_encryption

Whether to enable LUKS on cluster VMs' local disks.

gcp_attributes

Attributes related to clusters running on Google Cloud Platform.

init_scripts

The configuration for storing init scripts.

instance_pool_id

The optional ID of the instance pool to which the cluster belongs.

node_type_id

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster.

num_workers

Number of worker nodes that this cluster should have.

policy_id

The ID of the cluster policy used to create the cluster if applicable.

runtime_engine

Decides which runtime engine to be use, e.g.

single_user_name

Single user name if data_security_mode is SINGLE_USER.

spark_conf

An object containing a set of optional, user-specified Spark configuration key-value pairs.

spark_env_vars

An object containing a set of optional, user-specified environment variable key-value pairs.

ssh_public_keys

SSH public key contents that will be added to each Spark node in this cluster.

workload_type

This field has no description yet.

timeout

Time to wait for the operation to complete in minutes.

callback

Function to report the status of the operation. By default, it reports to console.

Details

Creates a new Spark cluster. This method will acquire new instances from the cloud provider if necessary. Note: Databricks may not be able to acquire some of the requested nodes, due to cloud provider limitations (account limits, spot price, etc.) or transient network issues.

If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.