Customer and User Onboarding

This section provides guidance on implementing a structured onboarding framework. Effective onboarding is foundational to realizing value from the governance, isolation, tagging, and automation capabilities established in preceding sections.

A well-designed onboarding process ensures that new customers and users can quickly and securely access the resources they need while maintaining compliance with organizational policies.

Reference implementation: See how Firefly automates organization onboarding and user onboarding.

Design Principles

Separation of Customer and User Workflows

The framework separates onboarding into two distinct workflows: customer onboarding and user onboarding. This separation reflects a core design principle that these processes operate on different timescales and serve different purposes.

Workflow	Timing	Purpose	Automation
Customer onboarding	Once at setup	Establish foundational infrastructure	Terraform, DABs, Python SDK
User onboarding	Continuous	Identity provisioning and access	SCIM sync, API calls

This separation:

Reduces operational risk by ensuring that adding or removing users does not require infrastructure changes
Enables different teams to own different parts of the process
Creates a more auditable environment where infrastructure changes are tracked separately from user access changes

Group-Based Access Management

User access should be managed entirely through group membership. In a well-designed environment, onboarding a new user requires nothing more than adding that user to the appropriate groups.

All permissions for data access, compute resources, and platform features should be pre-configured at the group level. This approach:

Simplifies user onboarding
Ensures consistency across users with similar roles
Makes access auditing straightforward

Groups should be defined by persona or role rather than by individual resource. For example, rather than having separate groups for each catalog a data analyst might need, define a Data Analyst group that has been granted access to all relevant catalogs, schemas, and compute resources for that role.

Customer Onboarding Workflow

Customer onboarding establishes the foundational infrastructure that will support all users and workloads. This process should be executed once during initial setup, with subsequent modifications driven only by meaningful changes to business requirements.

All customer onboarding steps should be codified and automated using Terraform, DABs, or Python SDK to ensure reproducibility.

Group Structure and Persona Definition

Define the groups that will govern access throughout the environment. Groups should map to logical personas:

Data engineers
Data analysts
Data scientists
Business intelligence users
Administrators

Each persona should have a corresponding group in Unity Catalog. Access management should be performed exclusively through groups rather than direct user grants.

See Unity Catalog privileges and Grant, deny, and revoke privileges for implementation details.

Implementation Example: Create Groups and Assign Permissions

note

The following examples demonstrate onboarding patterns. Always verify API syntax against the official Databricks documentation.

Create groups using Python SDK:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create persona-based groups
groups = [
    {"name": "customer-acme-analysts", "display_name": "Acme Corp Data Analysts"},
    {"name": "customer-acme-engineers", "display_name": "Acme Corp Data Engineers"},
    {"name": "customer-acme-scientists", "display_name": "Acme Corp Data Scientists"},
    {"name": "customer-acme-admins", "display_name": "Acme Corp Administrators"}
]

for group_def in groups:
    group = w.groups.create(
        display_name=group_def["display_name"]
    )
    print(f"Created group: {group.display_name} (ID: {group.id})")

Grant catalog access to groups:

-- Data Analysts: Read-only access to production data
GRANT USE CATALOG ON CATALOG `customer_acme` TO `customer-acme-analysts`;
GRANT USE SCHEMA ON SCHEMA `customer_acme`.`sales` TO `customer-acme-analysts`;
GRANT SELECT ON SCHEMA `customer_acme`.`sales` TO `customer-acme-analysts`;

-- Data Engineers: Full access to manage ETL pipelines
GRANT USE CATALOG ON CATALOG `customer_acme` TO `customer-acme-engineers`;
GRANT CREATE SCHEMA, USE SCHEMA, CREATE TABLE ON CATALOG `customer_acme` TO `customer-acme-engineers`;

-- Admins: Full control
GRANT ALL PRIVILEGES ON CATALOG `customer_acme` TO `customer-acme-admins`;

Data Strategy

Establish the data architecture within Unity Catalog:

Component	Guidance
Catalog and schema hierarchy	Use Unity Catalog's three-level namespace to reflect organizational structure, data domains, or access patterns. See Unity Catalog object model.
Workspace bindings	Control which workspaces can see specific catalogs or schemas. See workspace bindings.
Fine-grained access	Implement row filters and column masks where needed.
Delta Sharing	Configure shares and recipients if data sharing across organizational boundaries is required.
Data ingestion	Establish methods based on data source, update frequency, and security requirements (Delta Sharing, SFTP, Unity Catalog Volumes).

Implementation Example: Unity Catalog Setup

note

The following examples demonstrate Unity Catalog provisioning. Always verify syntax against the official Databricks documentation.

Create catalog and schemas using Python SDK:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create customer catalog
catalog = w.catalogs.create(
    name="customer_acme",
    comment="Acme Corp customer catalog"
)

# Create schemas by data domain
schemas = ["sales", "marketing", "operations", "analytics"]
for schema_name in schemas:
    schema = w.schemas.create(
        name=schema_name,
        catalog_name="customer_acme",
        comment=f"Acme Corp {schema_name} data"
    )
    print(f"Created schema: {catalog.name}.{schema.name}")

Bind catalog to customer workspace:

# Bind catalog to specific workspace (isolate customer data)
w.workspace_bindings.update(
    name="customer_acme",
    assign_workspaces=[workspace_id],
    unassign_workspaces=[]
)

Set up external location for customer data:

# Create external location for customer's cloud storage
external_location = w.external_locations.create(
    name="customer_acme_raw_data",
    url="s3://acme-databricks-data/raw/",
    credential_name="aws-acme-credential",
    comment="Acme Corp raw data location"
)

# Grant access to data engineers
w.grants.update(
    securable_type="EXTERNAL_LOCATION",
    full_name="customer_acme_raw_data",
    changes=[
        {
            "principal": "customer-acme-engineers",
            "add": ["CREATE_EXTERNAL_TABLE", "READ_FILES", "WRITE_FILES"]
        }
    ]
)

Pre-Configured Compute Resources

Provision compute resources with appropriate tagging for cost attribution before users are onboarded:

Resource	Configuration
SQL Warehouses	Provision for BI and SQL analytics workloads with sizing based on expected query patterns. Apply customer attribution tags and grant access to appropriate groups.
Clusters	Configure with policies that enforce tagging requirements and resource limits. Add custom tags that flow through to billing system tables.
AI Agents and Model Serving	Configure endpoints with appropriate tags and access controls if AI capabilities will be used.
Pre-Populated Notebooks	Create template notebooks demonstrating common patterns, library imports, and connections to customer data assets.

Implementation Example: Provision Compute Resources

note

The following examples demonstrate compute provisioning patterns. Always verify syntax against the official Databricks documentation.

Create SQL warehouse with customer tags:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create SQL warehouse for customer
warehouse = w.warehouses.create(
    name="customer-acme-analytics",
    cluster_size="Medium",
    min_num_clusters=1,
    max_num_clusters=3,
    auto_stop_mins=15,
    enable_serverless_compute=True,
    tags={
        "custom_tags": [
            {"key": "customer_id", "value": "acme_corp"},
            {"key": "environment", "value": "production"},
            {"key": "service", "value": "analytics"}
        ]
    }
)

# Grant warehouse access to analysts group
w.permissions.update(
    request_object_type="warehouses",
    request_object_id=warehouse.id,
    access_control_list=[
        {
            "group_name": "customer-acme-analysts",
            "permission_level": "CAN_USE"
        }
    ]
)

Create cluster policy for customer:

# Create cluster policy with enforced tags
policy = w.cluster_policies.create(
    name="customer-acme-policy",
    definition="""{
        "custom_tags.customer_id": {
            "type": "fixed",
            "value": "acme_corp",
            "hidden": false
        },
        "custom_tags.environment": {
            "type": "fixed",
            "value": "production",
            "hidden": true
        },
        "autotermination_minutes": {
            "type": "fixed",
            "value": 30
        },
        "node_type_id": {
            "type": "allowlist",
            "values": ["i3.xlarge", "i3.2xlarge"],
            "defaultValue": "i3.xlarge"
        }
    }"""
)

# Assign policy to engineers group
w.permissions.update(
    request_object_type="cluster-policies",
    request_object_id=policy.policy_id,
    access_control_list=[
        {
            "group_name": "customer-acme-engineers",
            "permission_level": "CAN_USE"
        }
    ]
)

Import template notebooks:

import io
from databricks.sdk.service.workspace import ImportFormat

# Import onboarding notebook for customer
w.workspace.upload(
    path="/Workspace/Customers/Acme/Getting Started",
    content=io.BytesIO(notebook_content.encode()),
    format=ImportFormat.AUTO,
    overwrite=True
)

# Grant notebook access to customer groups
w.permissions.update(
    request_object_type="notebooks",
    request_object_id="/Workspace/Customers/Acme/Getting Started",
    access_control_list=[
        {
            "group_name": "customer-acme-analysts",
            "permission_level": "CAN_READ"
        },
        {
            "group_name": "customer-acme-engineers",
            "permission_level": "CAN_EDIT"
        }
    ]
)

User Onboarding Workflow

User onboarding should be lightweight and repeatable. If customer onboarding has been executed properly, adding a new user requires only identity provisioning and group assignment—no infrastructure or permission changes.

Identity Provider Integration

Users and service principals should be provisioned through the organization's identity provider (IdP). This ensures centralized authentication and that user lifecycle events are reflected automatically in Databricks access.

SCIM provisioning can automatically sync users and groups from enterprise identity systems like Azure AD, Okta, and OneLogin.

For automated workloads and CI/CD pipelines, service principals should represent applications or processes that need programmatic access. Service principals follow the same group-based access model as human users.

Group Assignment

The core of user onboarding is adding the user or service principal to the appropriate groups. This single action provisions all necessary access to data assets, compute resources, and platform features.

For example, a new data analyst might be added to groups such as:

analysts
production-data-readers
sql-warehouse-users

These group memberships collectively grant access to relevant catalogs and schemas, permission to use SQL warehouses, and any other capabilities defined for analysts. No additional configuration should be required.

Implementation Example: User and Service Principal Onboarding

note

The following examples demonstrate user provisioning patterns. Always verify syntax against the official Databricks documentation.

Create user and assign to groups:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create user (or provision via SCIM from IdP)
user = w.users.create(
    user_name="jane.doe@acmecorp.com",
    display_name="Jane Doe"
)

# Add user to persona-based groups
groups_to_join = [
    "customer-acme-analysts",
    "customer-acme-users"
]

for group_name in groups_to_join:
    # Get group ID
    group = w.groups.get(display_name=group_name)
    
    # Add user to group
    w.groups.patch(
        id=group.id,
        operations=[
            {
                "op": "add",
                "path": "members",
                "value": [{"value": user.id}]
            }
        ]
    )
    print(f"Added {user.user_name} to {group_name}")

Create service principal for automation:

from databricks.sdk import AccountClient

# Create service principal for customer's ETL pipeline
sp = w.service_principals.create(
    display_name="acme-etl-pipeline",
    application_id="acme-etl-app"
)

# Add service principal to engineers group
engineers_group = w.groups.get(display_name="customer-acme-engineers")
w.groups.patch(
    id=engineers_group.id,
    operations=[
        {
            "op": "add",
            "path": "members",
            "value": [{"value": sp.id}]
        }
    ]
)

# Generate OAuth secret for service principal authentication
# Note: Requires AccountClient with account-level credentials
a = AccountClient()
secret = a.service_principal_secrets.create(
    service_principal_id=sp.id
)
print(f"Service Principal ID: {sp.application_id}")
print(f"Secret: {secret.secret}")  # Store securely!

Bulk user provisioning:

# Onboard multiple users at once
new_users = [
    {"email": "john.smith@acmecorp.com", "name": "John Smith", "groups": ["customer-acme-analysts"]},
    {"email": "sarah.jones@acmecorp.com", "name": "Sarah Jones", "groups": ["customer-acme-engineers"]},
    {"email": "mike.wilson@acmecorp.com", "name": "Mike Wilson", "groups": ["customer-acme-scientists"]}
]

for user_def in new_users:
    # Create user
    user = w.users.create(
        user_name=user_def["email"],
        display_name=user_def["name"]
    )
    
    # Assign to groups
    for group_name in user_def["groups"]:
        group = w.groups.get(display_name=group_name)
        w.groups.patch(
            id=group.id,
            operations=[
                {
                    "op": "add",
                    "path": "members",
                    "value": [{"value": user.id}]
                }
            ]
        )
    
    print(f"✓ Onboarded {user.display_name}")

Verify user access:

# Check user's group memberships
user = w.users.get(user_id="jane.doe@acmecorp.com")
groups = w.groups.list(filter=f'members/value eq "{user.id}"')

print(f"User {user.display_name} is member of:")
for group in groups:
    print(f"  - {group.display_name}")

Onboarding Checklists

Customer Onboarding Checklist

Complete once during initial setup. Automate via Terraform, DABs, or Python SDK.

Define personas and create corresponding groups with appropriate permissions
Set up Unity Catalog hierarchy (catalogs, schemas, workspace bindings, row/column security)
Configure data sharing and ingestion (Delta Sharing, SFTP, Volumes)
Provision compute with attribution tags (SQL Warehouses, clusters, AI endpoints)
Create template notebooks and grant compute access to groups
Validate end-to-end access with a representative user from each persona

User Onboarding Checklist

Complete for each new user or service principal.

Provision identity in IdP and verify SCIM sync to Databricks
Add user/service principal to groups based on persona
Validate access and share onboarding documentation

Summary: Customer vs. User Onboarding

Aspect	Customer Onboarding	User Onboarding
Frequency	Once at setup; updates driven by business changes	Continuous, as users join or leave
Scope	Infrastructure, data architecture, compute resources	Identity provisioning and group membership
Primary Actions	Create groups, configure catalogs, provision compute, set up data sharing	Add user to IdP, assign to groups
Typical Owner	Platform engineering, data platform team	IT operations, identity team, hiring managers
Automation	Terraform, DABs, Python SDK	SCIM sync, API calls for group assignment
Impact of Changes	Affects all users; requires careful planning and testing	Affects individual user; low risk, easily reversible

API & SDK Reference

Resource	Documentation
Account API - Workspaces	docs.databricks.com/api/account/workspaces
Workspace API - Groups	docs.databricks.com/api/workspace/groups
Workspace API - Users	docs.databricks.com/api/workspace/users
Workspace API - Service Principals	docs.databricks.com/api/workspace/serviceprincipals
Workspace API - Catalogs	docs.databricks.com/api/workspace/catalogs
Workspace API - Schemas	docs.databricks.com/api/workspace/schemas
Workspace API - Workspace Bindings	docs.databricks.com/api/workspace/workspacebindings
Workspace API - Grants	docs.databricks.com/api/workspace/grants
Workspace API - Warehouses	docs.databricks.com/api/workspace/warehouses
Workspace API - Cluster Policies	docs.databricks.com/api/workspace/clusterpolicies
Databricks SDK for Python	databricks-sdk-py.readthedocs.io
SCIM Provisioning	docs.databricks.com/admin/users-groups/scim/

What's Next

Automation — Infrastructure as code for onboarding automation
Governance — Unity Catalog patterns for access control
Cost Management — Tagging during customer onboarding

Design Principles​

Separation of Customer and User Workflows​

Group-Based Access Management​

Customer Onboarding Workflow​

Group Structure and Persona Definition​

Data Strategy​

Pre-Configured Compute Resources​

User Onboarding Workflow​

Identity Provider Integration​

Group Assignment​

Onboarding Checklists​

Customer Onboarding Checklist​

User Onboarding Checklist​

Summary: Customer vs. User Onboarding​

API & SDK Reference​

What's Next​

Design Principles

Separation of Customer and User Workflows

Group-Based Access Management

Customer Onboarding Workflow

Group Structure and Persona Definition

Data Strategy

Pre-Configured Compute Resources

User Onboarding Workflow

Identity Provider Integration

Group Assignment

Onboarding Checklists

Customer Onboarding Checklist

User Onboarding Checklist

Summary: Customer vs. User Onboarding

API & SDK Reference

What's Next