Cost Management

Cost management is fundamental to understanding margins, pricing customers accurately, and scaling profitably. Databricks provides a comprehensive suite of tools for monitoring, attributing, and controlling costs across your data and AI workloads. By implementing proper cost management practices from the start, you can gain visibility into per-customer spending, accurately attribute usage to customers and internal operations, and proactively manage budgets.

For additional tips, see Easy Ways to Optimize Your Costs.

Overview

The Databricks cost management framework is built around four key capabilities:

Capability	Description
System Tables	Billable usage logs stored in `system.billing.usage` provide granular details about account usage, including metadata about resources, custom tags, and user identity
Tagging	Custom tags enable accurate attribution of Databricks usage to business units, teams, and projects for chargeback purposes
Budgets	Create budget thresholds with email alerts to stay informed about usage across your account, workspaces, or specific tag-based groups
AI/BI Dashboards	Pre-built cost management dashboards and AI/BI Genie spaces for visualizing and exploring usage data

Tagging Strategy

For Built On Databricks solutions, the tagging strategy should be a part of system design—not an operational afterthought. Whether you charge customers via usage-based pricing or flat subscriptions, attribution is essential for understanding true cost, gross margin, and scalability over time.

tip

If you are a Built-On partner already, refer to guidance in the Partner Portal on the tagging requirements for the Built-On program.

Design-Time, Not Retrofit

Tagging should be designed alongside your pricing and packaging, and built into your automation from day one. Retroactive tagging is incomplete, error-prone, and often impossible at scale.

Key implications:

Tagging decisions directly affect billing accuracy, margin visibility, and contract viability
Manual tagging does not scale—automated enforcement is necessary
Tags should be applied programmatically at resource creation (clusters, jobs, SQL warehouses, serverless workloads)

This differs from internal cost management models that can afford to "start small and iterate." Partners building commercial solutions need attribution in place before onboarding the first customer.

How Tagging Works

Tagging operates at the compute level—clusters, jobs, SQL warehouses, and serverless workloads are the resources that generate DBU consumption and carry attribution tags.

Tags follow a parent-child relationship and roll up to the workspace level:

Deployment Model	Tagging Approach
Workspace per customer	Tag at the workspace level; all underlying compute automatically inherits those tags. Simplifies attribution since all usage in the workspace belongs to one customer.
Shared workspace (multi-customer)	Tag at the compute level per customer. If you require per-customer attribution, provision dedicated compute resources for each customer.

Your deployment model and tagging strategy are interrelated decisions. While workspace-per-customer simplifies attribution through inheritance, shared workspaces offer infrastructure efficiency—but require disciplined compute-level tagging.

Supported Resources for Custom Tags

Resource	UI Tagging	API Tagging
Workspace	Azure Only	Account API
Pool	Pools UI	Instance Pool API
All-purpose & Job Compute	Compute UI	Clusters API
SQL Warehouse	SQL Warehouse UI	Warehouses API
Database Instance	Database Instance UI	Database API
Serverless Workloads	Account Console	Serverless Budget Policies

For detailed guidance, see Use tags to attribute and track usage.

Implementation Steps

Step 1: Set Up Tagging Best Practices

Establish tagging standards using customer_id as a foundational tag key for external billing attribution. Tags are formatted as key:value pairs and can be applied to compute resources, SQL warehouses, jobs, and serverless workloads.

Example Tagging Schema:

Tag Key	Purpose	Example Values
`customer_id`	Primary billing entity	`acme_corp`, `customer_12345`
`tenant_id`	Sub-customer or business unit	`acme_marketing`, `acme_engineering`
`environment`	Operational vs. customer workloads	`production`, `internal`, `staging`
`service`	Application component or feature	`etl_pipeline`, `analytics_api`, `ml_inference`
`cost_center`	Internal cost allocation	`platform_ops`, `data_engineering`

For external consumption (customer billing), customer_id is typically sufficient. Additional tags help with internal margin analysis and operational cost tracking.

Implementation Example: Programmatic Tagging

note

The following examples demonstrate tagging patterns. Always verify API syntax and parameters against the official Databricks documentation.

Apply tags when creating clusters:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create cluster with customer tags
cluster = w.clusters.create(
    cluster_name="customer-acme-etl",
    spark_version="13.3.x-scala2.12",
    node_type_id="i3.xlarge",
    num_workers=3,
    autotermination_minutes=30,
    custom_tags={
        "customer_id": "acme_corp",
        "environment": "production",
        "service": "etl_pipeline"
    }
)

Apply tags when creating SQL warehouses:

# Create SQL warehouse with customer tags
warehouse = w.warehouses.create(
    name="customer-acme-warehouse",
    cluster_size="Small",
    min_num_clusters=1,
    max_num_clusters=3,
    auto_stop_mins=15,
    tags={
        "custom_tags": [
            {"key": "customer_id", "value": "acme_corp"},
            {"key": "environment", "value": "production"},
            {"key": "service", "value": "analytics_api"}
        ]
    }
)

Apply tags to jobs:

# Create job with customer tags
job = w.jobs.create(
    name="customer-acme-daily-report",
    tasks=[
        {
            "task_key": "generate_report",
            "notebook_task": {
                "notebook_path": "/Workspace/reports/daily_summary"
            },
            "existing_cluster_id": cluster.cluster_id
        }
    ],
    tags={
        "customer_id": "acme_corp",
        "service": "reporting"
    }
)

Step 2: Implement Tag Enforcement Policies

Implement fixed policies requiring pre-defined tags to be applied to all workloads. This ensures completeness and accuracy of your cost data and prevents untagged resources from incurring unattributed costs.

See Tag enforcement for detailed guidance.

Policy Type	Description
Compute Policies	Enforce required tags on clusters and pools
Serverless Budget Policies	Apply tags to serverless compute workloads including notebooks, jobs, pipelines, and model serving endpoints

Implementation Example: Tag Enforcement

note

The following examples demonstrate enforcement patterns. Always verify syntax against the official Databricks documentation.

Cluster policy requiring customer_id tag:

{
  "custom_tags.customer_id": {
    "type": "fixed",
    "value": "{{customer_id}}",
    "hidden": false
  },
  "custom_tags.environment": {
    "type": "fixed",
    "value": "production",
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 30,
    "hidden": false
  },
  "spark_version": {
    "type": "allowlist",
    "values": [
      "13.3.x-scala2.12",
      "14.3.x-scala2.12"
    ],
    "defaultValue": "14.3.x-scala2.12"
  },
  "node_type_id": {
    "type": "allowlist",
    "values": [
      "i3.xlarge",
      "i3.2xlarge",
      "i3.4xlarge"
    ],
    "defaultValue": "i3.xlarge"
  }
}

Serverless budget policy (applied at account level):

Serverless policies automatically apply tags to notebooks, jobs, pipelines, and serving endpoints. Configure in Account Console → Compute → Serverless → Budget Policies:

Policy name: customer-acme-serverless
Monthly budget: $5,000
Tags: customer_id=acme_corp, environment=production
Scope: Apply to specific workspaces or users

Python SDK - Create cluster policy:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create policy requiring customer_id tag
policy = w.cluster_policies.create(
    name="customer-attribution-required",
    definition="""{
        "custom_tags.customer_id": {
            "type": "fixed",
            "value": "{{customer_id}}",
            "hidden": false
        },
        "autotermination_minutes": {
            "type": "fixed",
            "value": 30,
            "hidden": false
        }
    }"""
)

Step 3: Develop Budget Alerts

Create budgets and budget alerts to monitor usage associated with appropriate tags. Budgets help you stay informed about spending and can trigger email notifications when thresholds are exceeded.

note

Budgets are currently created through the Account Console UI. There is no programmatic API or SDK support for budget creation. For automated cost monitoring, query system.billing.usage directly (see Step 4 below).

See Create and monitor budgets for complete instructions.

To create a budget:

In the Account Console sidebar, click Usage
Click the Budgets tab, then click Add budget
Enter a name and monthly budget amount (in USD)
In the Definitions section, limit tracking to specific workspaces and/or custom tags (e.g., customer_id=acme_corp)
Enter email addresses for notifications when the budget is reached
Click Create

Best practices for partner budgets:

Create per-customer budgets filtered by customer_id tag
Set thresholds at 80% and 100% of expected monthly usage
Include both technical and business stakeholders in alerts
Review budget vs. actual monthly to refine capacity planning

note

Budgets improve your ability to monitor usage but do not stop usage or charges from occurring. Your final bill can exceed your budget amount.

Step 4: Analyze Usage Data

Account admins can import customizable AI/BI cost management dashboards to any Unity Catalog-enabled workspace. These dashboards contain usage breakdowns by product, SKU name, and custom tags, along with analysis of the most expensive usage sources.

See Usage dashboards for more information.

To import the dashboard:

From the Account Console, click the Usage tab
Click the Setup dashboard button
Select whether to reflect the entire account's usage or just a single workspace
Select the destination workspace and click Import

Once imported, the dashboard is fully customizable and can be published like any other dashboard. You can also use AI/BI Genie to explore spending trends, anomalies, and cost-saving recommendations through a natural language interface.

Implementation Example: Query System Tables for Cost Analysis

note

The following examples demonstrate cost analysis patterns. Always verify table schema and functions against the official Databricks documentation.

Analyze costs by customer:

SELECT
  usage_metadata.custom_tags['customer_id'] AS customer_id,
  usage_date,
  SUM(usage_quantity) AS total_dbus,
  COUNT(DISTINCT workspace_id) AS workspaces_used,
  SUM(usage_quantity * list_prices.pricing.default) AS estimated_cost_usd
FROM system.billing.usage
LEFT JOIN system.billing.list_prices
  ON usage.sku_name = list_prices.sku_name
  AND usage.usage_date = list_prices.price_start_time
WHERE usage_date >= CURRENT_DATE() - INTERVAL 30 DAYS
  AND usage_metadata.custom_tags['customer_id'] IS NOT NULL
GROUP BY customer_id, usage_date
ORDER BY usage_date DESC, total_dbus DESC;

Identify untagged resources:

SELECT
  usage_date,
  workspace_id,
  sku_name,
  usage_unit,
  SUM(usage_quantity) AS total_usage
FROM system.billing.usage
WHERE usage_date >= CURRENT_DATE() - INTERVAL 7 DAYS
  AND usage_metadata.custom_tags['customer_id'] IS NULL
GROUP BY usage_date, workspace_id, sku_name, usage_unit
ORDER BY total_usage DESC;

Monthly cost by customer and service:

SELECT
  DATE_TRUNC('month', usage_date) AS month,
  usage_metadata.custom_tags['customer_id'] AS customer_id,
  usage_metadata.custom_tags['service'] AS service,
  sku_name,
  SUM(usage_quantity) AS total_dbus,
  SUM(usage_quantity * list_prices.pricing.default) AS estimated_cost_usd
FROM system.billing.usage
LEFT JOIN system.billing.list_prices
  ON usage.sku_name = list_prices.sku_name
  AND usage.usage_date = list_prices.price_start_time
WHERE usage_date >= DATE_TRUNC('month', CURRENT_DATE() - INTERVAL 3 MONTHS)
GROUP BY month, customer_id, service, sku_name
ORDER BY month DESC, estimated_cost_usd DESC;

Top 10 most expensive workloads:

SELECT
  usage_metadata.custom_tags['customer_id'] AS customer_id,
  usage_metadata.cluster_id,
  usage_metadata.job_id,
  usage_metadata.job_name,
  SUM(usage_quantity) AS total_dbus,
  SUM(usage_quantity * list_prices.pricing.default) AS estimated_cost_usd
FROM system.billing.usage
LEFT JOIN system.billing.list_prices
  ON usage.sku_name = list_prices.sku_name
  AND usage.usage_date = list_prices.price_start_time
WHERE usage_date >= CURRENT_DATE() - INTERVAL 30 DAYS
GROUP BY customer_id, usage_metadata.cluster_id, usage_metadata.job_id, usage_metadata.job_name
ORDER BY estimated_cost_usd DESC
LIMIT 10;

Workspace Management Scenarios

Your tagging implementation approach depends on how much control you have over the Databricks environment.

Partner-Managed Workspaces

When you provision and manage the workspace, you have full access to Databricks' native tagging and policy enforcement:

Use cluster policies to require and lock attribution tags at compute creation
Configure serverless budget policies to automatically apply tags to notebooks, jobs, pipelines, and serving endpoints
Restrict warehouse creation to administrators who ensure proper tagging on setup
Leverage system tables and the usage dashboard for usage reporting and margin analysis

For partner-managed workspaces, track two categories of consumption:

Type	Purpose	Example Tags
Internal consumption	Track your own platform operations, engineering, and overhead	`environment`, `team`, `service`
External consumption	Attribute usage for customer billing or margin analysis	`customer_id`, `line_of_business`, `cost_center`

For external consumption, tag granularity should match what you bill. In most cases this means per-customer attribution, but for large enterprise deployments you may need finer granularity. See Monitor usage using tags for implementation details.

Implementation Example: Enforcing Tags at Workspace Provisioning

note

The following examples demonstrate enforcement patterns for partner-managed workspaces. Always verify API syntax against the official Databricks documentation.

Provision workspace with default policies and tags:

from databricks.sdk import AccountClient, WorkspaceClient

# Initialize account client
account = AccountClient()

# 1. Create workspace for customer
workspace = account.workspaces.create(
    workspace_name="customer-acme-prod",
    aws_region="us-west-2",
    credentials_id="<credentials-id>",
    storage_configuration_id="<storage-config-id>"
)

# 2. Initialize workspace client
w = WorkspaceClient(host=workspace.deployment_name)

# 3. Create cluster policy requiring customer tags
policy = w.cluster_policies.create(
    name="customer-acme-policy",
    definition="""{
        "custom_tags.customer_id": {
            "type": "fixed",
            "value": "acme_corp",
            "hidden": false
        },
        "custom_tags.environment": {
            "type": "fixed",
            "value": "production",
            "hidden": true
        },
        "autotermination_minutes": {
            "type": "fixed",
            "value": 30,
            "hidden": false
        }
    }"""
)

# 4. Assign policy to customer's group
w.permissions.update(
    request_object_type="cluster-policies",
    request_object_id=policy.policy_id,
    access_control_list=[
        {
            "group_name": "customer-acme-users",
            "permission_level": "CAN_USE"
        }
    ]
)

# 5. Create pre-tagged SQL warehouse
warehouse = w.warehouses.create(
    name="customer-acme-warehouse",
    cluster_size="Small",
    max_num_clusters=3,
    auto_stop_mins=15,
    tags={
        "custom_tags": [
            {"key": "customer_id", "value": "acme_corp"},
            {"key": "environment", "value": "production"}
        ]
    }
)

# 6. Grant warehouse access to customer group
w.warehouses.set_workspace_warehouse_config(
    data_access_config=[
        {
            "group_name": "customer-acme-users",
            "permission_level": "CAN_USE"
        }
    ]
)

Enforce tagging compliance check (housekeeping job):

# Run daily to identify untagged resources
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Check for clusters without required tags
clusters = w.clusters.list()
for cluster in clusters:
    tags = cluster.custom_tags or {}
    if 'customer_id' not in tags:
        print(f"⚠️ Untagged cluster: {cluster.cluster_id} ({cluster.cluster_name})")
        # Option: Auto-terminate, alert, or tag programmatically

# Check for warehouses without required tags
warehouses = w.warehouses.list()
for warehouse in warehouses:
    tags = warehouse.tags or {}
    tag_dict = {tag['key']: tag['value'] for tag in tags.get('custom_tags', [])}
    if 'customer_id' not in tag_dict:
        print(f"⚠️ Untagged warehouse: {warehouse.id} ({warehouse.name})")

Customer-Managed Workspaces

Customer-controlled workspaces limit your access to control tagging and view consumption. This requires alternative telemetry approaches—see Customer Managed for guidance.

Enforcement Best Practices

Compute as a Managed Service — Compute should be preconfigured with tags and cost controls in place. Users consume resources, they don't configure them.
- Cluster policies — Assign by group with VMs, libraries, auto-scaling, and auto-termination defined. Users select from governed options.
- Serverless policies — Attach by default so tags apply automatically to all serverless resources.
- Warehouses — Provision by group. Users can start or restart warehouses they have access to—not create new ones.
Budget alerts — Configure budget monitoring to track usage against thresholds by tag
Housekeeping jobs — Build automated compliance checks to identify and remediate untagged or mistagged resources

Additional Cost Controls

Beyond tagging and budgets, Databricks provides proactive controls to prevent cost overruns before they occur.

Compute Policies

Compute policies define governance rules for cluster creation, restricting instance types, autoscaling behavior, auto-termination settings, and enforcing required tags. Policies are assigned to groups and ensure users can only create compliant compute resources.

Key policy controls for cost management:

Instance type restrictions: Limit to cost-effective instance families
Auto-termination: Force clusters to terminate after idle period (e.g., 30 minutes)
Autoscaling limits: Cap maximum workers to prevent runaway costs
Spot instance usage: Require spot/preemptible instances for non-critical workloads

See Cluster policy definitions for complete reference.

Implementation Example: Cost-Optimized Compute Policy

note

The following example demonstrates cost control patterns. Always verify policy syntax against the official Databricks documentation.

Cost-optimized cluster policy:

{
  "node_type_id": {
    "type": "allowlist",
    "values": ["i3.xlarge", "i3.2xlarge"],
    "defaultValue": "i3.xlarge"
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "range",
    "minValue": 10,
    "maxValue": 60,
    "defaultValue": 30
  },
  "autoscale": {
    "type": "fixed",
    "value": {
      "min_workers": 1,
      "max_workers": 10
    }
  },
  "aws_attributes.availability": {
    "type": "fixed",
    "value": "SPOT_WITH_FALLBACK",
    "hidden": true
  },
  "aws_attributes.spot_bid_price_percent": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "custom_tags.customer_id": {
    "type": "fixed",
    "value": "{{customer_id}}",
    "hidden": false
  },
  "custom_tags.environment": {
    "type": "fixed",
    "value": "production",
    "hidden": true
  }
}

This policy:

Restricts to cost-effective i3 instances
Forces 10-60 minute auto-termination
Caps autoscaling at 10 workers
Uses spot instances with on-demand fallback
Requires customer_id tag

SQL Warehouse Size and Auto-Stop

SQL warehouses offer built-in cost controls through sizing and auto-stop configuration:

Cluster size: Start with "Small" or "Medium" for most workloads
Scaling: Enable auto-scaling to handle burst traffic without over-provisioning
Auto-stop: Configure 10-15 minute idle timeout to prevent runaway costs
Serverless warehouses: Consider serverless SQL for variable workloads (pay only for active query time)

See SQL warehouse types for sizing guidance.

Serverless Overspend Protection

Serverless compute includes built-in timeout protection:

Default timeout: 2.5 hours for serverless notebooks and jobs
Configurable: Adjust via spark.databricks.execution.timeout spark config
Budget policies: Apply spending limits to serverless workloads at account or workspace level

For partners, consider setting tighter timeouts for customer-facing workloads and monitoring serverless usage patterns to identify optimization opportunities.

Cost Visibility Enables Better Decisions

Even with flat subscription fees, your costs are usage-based. Customer-level attribution lets you:

Analyze account profitability
Understand margin by segment
Make informed pricing and packaging decisions
Position for a future shift to usage-based or hybrid models

Tagging is what makes your cost data actionable.

Beyond internal operations, proper tagging is also a requirement for co-sell motions within the Databricks ecosystem. The Built On Databricks Partner Program—which provides go-to-market support, technical resources, and partnership benefits—requires accurate customer attribution through standardized tagging.

What's Next

Automation — Automate tagging enforcement with Terraform and DABs
Onboarding — Configure tagging during customer onboarding
Scale & Limits — Understand resource quotas and limits

Overview​

Tagging Strategy​

Design-Time, Not Retrofit​

How Tagging Works​

Supported Resources for Custom Tags​

Implementation Steps​

Step 1: Set Up Tagging Best Practices​

Step 2: Implement Tag Enforcement Policies​

Step 3: Develop Budget Alerts​

Step 4: Analyze Usage Data​

Workspace Management Scenarios​

Partner-Managed Workspaces​

Customer-Managed Workspaces​

Enforcement Best Practices​

Additional Cost Controls​

Compute Policies​

SQL Warehouse Size and Auto-Stop​

Serverless Overspend Protection​

Cost Visibility Enables Better Decisions​

What's Next​

Overview

Tagging Strategy

Design-Time, Not Retrofit

How Tagging Works

Supported Resources for Custom Tags

Implementation Steps

Step 1: Set Up Tagging Best Practices

Step 2: Implement Tag Enforcement Policies

Step 3: Develop Budget Alerts

Step 4: Analyze Usage Data

Workspace Management Scenarios

Partner-Managed Workspaces

Customer-Managed Workspaces

Enforcement Best Practices

Additional Cost Controls

Compute Policies

SQL Warehouse Size and Auto-Stop

Serverless Overspend Protection

Cost Visibility Enables Better Decisions

What's Next