Skip to main content

Automation

Automation is fundamental to building scalable, reliable, and secure Databricks deployments. By adopting infrastructure-as-code (IaC) principles and implementing robust CI/CD pipelines, organizations can ensure consistency, reduce manual errors, and accelerate delivery cycles.

Core Philosophy: Automate Everything

The guiding principle for Databricks deployments should be automate everything. This includes workspace provisioning, security configurations, data pipelines, jobs, and monitoring.

Automation provides:

  • Version control and auditability
  • Repeatability across environments
  • Rapid recovery from failures
  • Consistent scaling across deployments

Terraform for Infrastructure Provisioning

Terraform is the recommended tool for provisioning and managing Databricks workspaces and the underlying cloud infrastructure. The Databricks Terraform Provider enables you to define workspaces, clusters, jobs, notebooks, Unity Catalog resources, permissions, and more as declarative code.

See the Databricks Terraform Provider documentation for complete API reference.

When to Use Terraform

Terraform is ideal for:

Security Reference Architecture (SRA)

For organizations with stringent security requirements, particularly those in regulated industries or government sectors, the Security Reference Architecture (SRA) Terraform Templates provide pre-configured security best practices.

The SRA enables deployment of Databricks workspaces with hardened configurations modeled after the most security-conscious customers. It covers AWS, AWS GovCloud, Azure, and GCP deployments, providing a strong foundation for secure infrastructure that aligns with Databricks Security Best Practices.

Databricks Asset Bundles (DABs)

Databricks Asset Bundles facilitate the adoption of software engineering best practices for data and AI projects. Bundles provide an infrastructure-as-code approach specifically designed for managing Databricks resources like jobs, pipelines, and ML experiments.

What DABs Include

A bundle provides an end-to-end definition of a project:

When to Use DABs

DABs are ideal for:

  • Team-based development of data, analytics, and ML projects
  • Managing ML pipeline resources with production best practices from day one
  • Setting organizational standards through custom bundle templates
  • Maintaining version history for regulatory compliance

Getting Started with DABs

Install the Databricks CLI and initialize a new bundle:

# Install Databricks CLI
pip install databricks-cli

# Initialize a new bundle from template
databricks bundle init

# Deploy to development
databricks bundle deploy -t dev

# Run a workflow
databricks bundle run my_job -t dev

See the Bundle CLI reference for complete command documentation.

Terraform vs. DABs: Choosing the Right Tool

Both Terraform and DABs serve important but distinct purposes. Understanding when to use each is critical for building an effective automation strategy.

AspectTerraformDABs
Primary PurposeInfrastructure provisioning and platform configurationApplication/project deployment and workflow management
ScopeWorkspaces, cloud resources, Unity Catalog, IAM, networkingJobs, pipelines, notebooks, ML experiments, dashboards
Change FrequencyLess frequent (infrastructure changes)More frequent (code and workflow updates)
Typical UsersPlatform/DevOps engineers, infrastructure teamsData engineers, data scientists, ML engineers
State ManagementTerraform state files (remote backend recommended)Workspace-based (no external state)
Template SupportTerraform modules for reusabilityCustom bundle templates for project standards

The most effective automation strategy combines both tools:

  1. Use Terraform to provision and configure the foundational infrastructure: workspaces, networking, security configurations, Unity Catalog metastores, and identity management. Terraform establishes the secure, compliant platform foundation.

  2. Use DABs to deploy and manage the applications and workflows that run on that infrastructure: data pipelines, ML training jobs, model serving endpoints, and dashboards. DABs enable rapid iteration on business logic while maintaining production best practices.

CI/CD Best Practices

Continuous integration and continuous delivery (CI/CD) automates the building, testing, and deployment of code, enabling more reliable and frequent releases.

See CI/CD Best Practices Documentation for detailed guidance.

High-Level CI/CD Flow

StageDescription
VersionStore code and notebooks in Git. Use Databricks Git folders for development and testing before committing changes.
CodeDevelop in Databricks notebooks or locally with VS Code using the Databricks extension.
BuildUse DABs to automatically build artifacts during deployments. Leverage pylint with the Databricks Labs plugin for code quality.
DeployDeploy changes using DABs with GitHub Actions, Azure DevOps, or Jenkins.
TestRun automated tests with pytest to validate code changes before production deployment.
RunExecute bundle workflows using the Databricks CLI.
MonitorTrack performance with Databricks jobs monitoring to identify and resolve production issues.

Authentication for CI/CD

Use service principals instead of user accounts for CI/CD authentication. For the most secure approach, implement OAuth token federation (workload identity federation), which eliminates the need to store Databricks secrets in your CI/CD system.

See CI/CD authentication best practices for detailed guidance on securing your deployment pipelines.

Quick Reference

Tool/ResourceLink
Databricks Terraform Providergithub.com/databricks/terraform-provider-databricks
Terraform Provider Documentationregistry.terraform.io/providers/databricks/databricks/latest/docs
Security Reference Architecture (SRA)github.com/databricks/terraform-databricks-sra
Databricks Asset Bundlesdocs.databricks.com/dev-tools/bundles/
Bundle Templatesdocs.databricks.com/dev-tools/bundles/templates
Databricks CLIdocs.databricks.com/dev-tools/cli/
CI/CD Best Practicesdocs.databricks.com/dev-tools/ci-cd/
GitHub Actions for DABsdocs.databricks.com/dev-tools/bundles/ci-cd#github-actions
VS Code Extensionmarketplace.visualstudio.com/items?itemName=databricks.databricks

What's Next