Create new cluster. — create_cluster_and

This is a long-running operation, which blocks until Clusters on Databricks reach RUNNING state with the timeout of 20 minutes, that you can change via timeout parameter. By default, the state of Databricks Clusters is reported to console. You can change this behavior by changing the callback parameter.

Usage

create_cluster_and_wait(
  client,
  spark_version,
  apply_policy_default_values = NULL,
  autoscale = NULL,
  autotermination_minutes = NULL,
  aws_attributes = NULL,
  azure_attributes = NULL,
  cluster_log_conf = NULL,
  cluster_name = NULL,
  cluster_source = NULL,
  custom_tags = NULL,
  data_security_mode = NULL,
  docker_image = NULL,
  driver_instance_pool_id = NULL,
  driver_node_type_id = NULL,
  enable_elastic_disk = NULL,
  enable_local_disk_encryption = NULL,
  gcp_attributes = NULL,
  init_scripts = NULL,
  instance_pool_id = NULL,
  node_type_id = NULL,
  num_workers = NULL,
  policy_id = NULL,
  runtime_engine = NULL,
  single_user_name = NULL,
  spark_conf = NULL,
  spark_env_vars = NULL,
  ssh_public_keys = NULL,
  workload_type = NULL,
  timeout = 20,
  callback = cli_reporter
)

Arguments

client: Required. Instance of DatabricksClient()
spark_version: Required. The Spark version of the cluster, e.g.
apply_policy_default_values: This field has no description yet.
autoscale: Parameters needed in order to automatically scale clusters up and down based on load.
autotermination_minutes: Automatically terminates the cluster after it is inactive for this time in minutes.
aws_attributes: Attributes related to clusters running on Amazon Web Services.
azure_attributes: Attributes related to clusters running on Microsoft Azure.
cluster_log_conf: The configuration for delivering spark logs to a long-term storage destination.
cluster_name: Cluster name requested by the user.
cluster_source: Determines whether the cluster was created by a user through the UI, created by the Databricks Jobs Scheduler, or through an API request.
custom_tags: Additional tags for cluster resources.
data_security_mode: Data security mode decides what data governance model to use when accessing data from a cluster.
docker_image: This field has no description yet.
driver_instance_pool_id: The optional ID of the instance pool for the driver of the cluster belongs.
driver_node_type_id: The node type of the Spark driver.
enable_elastic_disk: Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.
enable_local_disk_encryption: Whether to enable LUKS on cluster VMs' local disks.
gcp_attributes: Attributes related to clusters running on Google Cloud Platform.
init_scripts: The configuration for storing init scripts.
instance_pool_id: The optional ID of the instance pool to which the cluster belongs.
node_type_id: This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster.
num_workers: Number of worker nodes that this cluster should have.
policy_id: The ID of the cluster policy used to create the cluster if applicable.
runtime_engine: Decides which runtime engine to be use, e.g.
single_user_name: Single user name if data_security_mode is SINGLE_USER.
spark_conf: An object containing a set of optional, user-specified Spark configuration key-value pairs.
spark_env_vars: An object containing a set of optional, user-specified environment variable key-value pairs.
ssh_public_keys: SSH public key contents that will be added to each Spark node in this cluster.
workload_type: This field has no description yet.
timeout: Time to wait for the operation to complete in minutes.
callback: Function to report the status of the operation. By default, it reports to console.

Details

Creates a new Spark cluster. This method will acquire new instances from the cloud provider if necessary. Note: Databricks may not be able to acquire some of the requested nodes, due to cloud provider limitations (account limits, spot price, etc.) or transient network issues.

If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.