Create new cluster.
create_cluster_and_wait.Rd
This is a long-running operation, which blocks until Clusters on Databricks reach
RUNNING state with the timeout of 20 minutes, that you can change via timeout
parameter.
By default, the state of Databricks Clusters is reported to console. You can change this behavior
by changing the callback
parameter.
Usage
create_cluster_and_wait(
client,
spark_version,
apply_policy_default_values = NULL,
autoscale = NULL,
autotermination_minutes = NULL,
aws_attributes = NULL,
azure_attributes = NULL,
cluster_log_conf = NULL,
cluster_name = NULL,
cluster_source = NULL,
custom_tags = NULL,
data_security_mode = NULL,
docker_image = NULL,
driver_instance_pool_id = NULL,
driver_node_type_id = NULL,
enable_elastic_disk = NULL,
enable_local_disk_encryption = NULL,
gcp_attributes = NULL,
init_scripts = NULL,
instance_pool_id = NULL,
node_type_id = NULL,
num_workers = NULL,
policy_id = NULL,
runtime_engine = NULL,
single_user_name = NULL,
spark_conf = NULL,
spark_env_vars = NULL,
ssh_public_keys = NULL,
workload_type = NULL,
timeout = 20,
callback = cli_reporter
)
Arguments
- client
Required. Instance of DatabricksClient()
- spark_version
Required. The Spark version of the cluster, e.g.
- apply_policy_default_values
This field has no description yet.
- autoscale
Parameters needed in order to automatically scale clusters up and down based on load.
- autotermination_minutes
Automatically terminates the cluster after it is inactive for this time in minutes.
- aws_attributes
Attributes related to clusters running on Amazon Web Services.
- azure_attributes
Attributes related to clusters running on Microsoft Azure.
- cluster_log_conf
The configuration for delivering spark logs to a long-term storage destination.
- cluster_name
Cluster name requested by the user.
- cluster_source
Determines whether the cluster was created by a user through the UI, created by the Databricks Jobs Scheduler, or through an API request.
- custom_tags
Additional tags for cluster resources.
- data_security_mode
Data security mode decides what data governance model to use when accessing data from a cluster.
- docker_image
This field has no description yet.
- driver_instance_pool_id
The optional ID of the instance pool for the driver of the cluster belongs.
- driver_node_type_id
The node type of the Spark driver.
- enable_elastic_disk
Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.
- enable_local_disk_encryption
Whether to enable LUKS on cluster VMs' local disks.
- gcp_attributes
Attributes related to clusters running on Google Cloud Platform.
- init_scripts
The configuration for storing init scripts.
- instance_pool_id
The optional ID of the instance pool to which the cluster belongs.
- node_type_id
This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster.
- num_workers
Number of worker nodes that this cluster should have.
- policy_id
The ID of the cluster policy used to create the cluster if applicable.
- runtime_engine
Decides which runtime engine to be use, e.g.
- single_user_name
Single user name if data_security_mode is
SINGLE_USER
.- spark_conf
An object containing a set of optional, user-specified Spark configuration key-value pairs.
- spark_env_vars
An object containing a set of optional, user-specified environment variable key-value pairs.
- ssh_public_keys
SSH public key contents that will be added to each Spark node in this cluster.
- workload_type
This field has no description yet.
- timeout
Time to wait for the operation to complete in minutes.
- callback
Function to report the status of the operation. By default, it reports to console.
Details
Creates a new Spark cluster. This method will acquire new instances from the cloud provider if necessary. Note: Databricks may not be able to acquire some of the requested nodes, due to cloud provider limitations (account limits, spot price, etc.) or transient network issues.
If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.