Skip to main content

Customer and User Onboarding

This section provides guidance on implementing a structured onboarding framework. Effective onboarding is foundational to realizing value from the governance, isolation, tagging, and automation capabilities established in preceding sections.

A well-designed onboarding process ensures that new customers and users can quickly and securely access the resources they need while maintaining compliance with organizational policies.

Reference implementation: See how Firefly automates organization onboarding and user onboarding.

Design Principles

Separation of Customer and User Workflows

The framework separates onboarding into two distinct workflows: customer onboarding and user onboarding. This separation reflects a core design principle that these processes operate on different timescales and serve different purposes.

WorkflowTimingPurposeAutomation
Customer onboardingOnce at setupEstablish foundational infrastructureTerraform, DABs, Python SDK
User onboardingContinuousIdentity provisioning and accessSCIM sync, API calls

This separation:

  • Reduces operational risk by ensuring that adding or removing users does not require infrastructure changes
  • Enables different teams to own different parts of the process
  • Creates a more auditable environment where infrastructure changes are tracked separately from user access changes

Group-Based Access Management

User access should be managed entirely through group membership. In a well-designed environment, onboarding a new user requires nothing more than adding that user to the appropriate groups.

All permissions for data access, compute resources, and platform features should be pre-configured at the group level. This approach:

  • Simplifies user onboarding
  • Ensures consistency across users with similar roles
  • Makes access auditing straightforward

Groups should be defined by persona or role rather than by individual resource. For example, rather than having separate groups for each catalog a data analyst might need, define a Data Analyst group that has been granted access to all relevant catalogs, schemas, and compute resources for that role.

Customer Onboarding Workflow

Customer onboarding establishes the foundational infrastructure that will support all users and workloads. This process should be executed once during initial setup, with subsequent modifications driven only by meaningful changes to business requirements.

All customer onboarding steps should be codified and automated using Terraform, DABs, or Python SDK to ensure reproducibility.

Group Structure and Persona Definition

Define the groups that will govern access throughout the environment. Groups should map to logical personas:

  • Data engineers
  • Data analysts
  • Data scientists
  • Business intelligence users
  • Administrators

Each persona should have a corresponding group in Unity Catalog. Access management should be performed exclusively through groups rather than direct user grants.

See Unity Catalog privileges and Grant, deny, and revoke privileges for implementation details.

Implementation Example: Create Groups and Assign Permissions
note

The following examples demonstrate onboarding patterns. Always verify API syntax against the official Databricks documentation.

Create groups using Python SDK:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create persona-based groups
groups = [
{"name": "customer-acme-analysts", "display_name": "Acme Corp Data Analysts"},
{"name": "customer-acme-engineers", "display_name": "Acme Corp Data Engineers"},
{"name": "customer-acme-scientists", "display_name": "Acme Corp Data Scientists"},
{"name": "customer-acme-admins", "display_name": "Acme Corp Administrators"}
]

for group_def in groups:
group = w.groups.create(
display_name=group_def["display_name"]
)
print(f"Created group: {group.display_name} (ID: {group.id})")

Grant catalog access to groups:

-- Data Analysts: Read-only access to production data
GRANT USE CATALOG ON CATALOG `customer_acme` TO `customer-acme-analysts`;
GRANT USE SCHEMA ON SCHEMA `customer_acme`.`sales` TO `customer-acme-analysts`;
GRANT SELECT ON SCHEMA `customer_acme`.`sales` TO `customer-acme-analysts`;

-- Data Engineers: Full access to manage ETL pipelines
GRANT USE CATALOG ON CATALOG `customer_acme` TO `customer-acme-engineers`;
GRANT CREATE SCHEMA, USE SCHEMA, CREATE TABLE ON CATALOG `customer_acme` TO `customer-acme-engineers`;

-- Admins: Full control
GRANT ALL PRIVILEGES ON CATALOG `customer_acme` TO `customer-acme-admins`;

Data Strategy

Establish the data architecture within Unity Catalog:

ComponentGuidance
Catalog and schema hierarchyUse Unity Catalog's three-level namespace to reflect organizational structure, data domains, or access patterns. See Unity Catalog object model.
Workspace bindingsControl which workspaces can see specific catalogs or schemas. See workspace bindings.
Fine-grained accessImplement row filters and column masks where needed.
Delta SharingConfigure shares and recipients if data sharing across organizational boundaries is required.
Data ingestionEstablish methods based on data source, update frequency, and security requirements (Delta Sharing, SFTP, Unity Catalog Volumes).
Implementation Example: Unity Catalog Setup
note

The following examples demonstrate Unity Catalog provisioning. Always verify syntax against the official Databricks documentation.

Create catalog and schemas using Python SDK:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create customer catalog
catalog = w.catalogs.create(
name="customer_acme",
comment="Acme Corp customer catalog"
)

# Create schemas by data domain
schemas = ["sales", "marketing", "operations", "analytics"]
for schema_name in schemas:
schema = w.schemas.create(
name=schema_name,
catalog_name="customer_acme",
comment=f"Acme Corp {schema_name} data"
)
print(f"Created schema: {catalog.name}.{schema.name}")

Bind catalog to customer workspace:

# Bind catalog to specific workspace (isolate customer data)
w.workspace_bindings.update(
name="customer_acme",
assign_workspaces=[workspace_id],
unassign_workspaces=[]
)

Set up external location for customer data:

# Create external location for customer's cloud storage
external_location = w.external_locations.create(
name="customer_acme_raw_data",
url="s3://acme-databricks-data/raw/",
credential_name="aws-acme-credential",
comment="Acme Corp raw data location"
)

# Grant access to data engineers
w.grants.update(
securable_type="EXTERNAL_LOCATION",
full_name="customer_acme_raw_data",
changes=[
{
"principal": "customer-acme-engineers",
"add": ["CREATE_EXTERNAL_TABLE", "READ_FILES", "WRITE_FILES"]
}
]
)

Pre-Configured Compute Resources

Provision compute resources with appropriate tagging for cost attribution before users are onboarded:

ResourceConfiguration
SQL WarehousesProvision for BI and SQL analytics workloads with sizing based on expected query patterns. Apply customer attribution tags and grant access to appropriate groups.
ClustersConfigure with policies that enforce tagging requirements and resource limits. Add custom tags that flow through to billing system tables.
AI Agents and Model ServingConfigure endpoints with appropriate tags and access controls if AI capabilities will be used.
Pre-Populated NotebooksCreate template notebooks demonstrating common patterns, library imports, and connections to customer data assets.
Implementation Example: Provision Compute Resources
note

The following examples demonstrate compute provisioning patterns. Always verify syntax against the official Databricks documentation.

Create SQL warehouse with customer tags:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create SQL warehouse for customer
warehouse = w.warehouses.create(
name="customer-acme-analytics",
cluster_size="Medium",
min_num_clusters=1,
max_num_clusters=3,
auto_stop_mins=15,
enable_serverless_compute=True,
tags={
"custom_tags": [
{"key": "customer_id", "value": "acme_corp"},
{"key": "environment", "value": "production"},
{"key": "service", "value": "analytics"}
]
}
)

# Grant warehouse access to analysts group
w.permissions.update(
request_object_type="warehouses",
request_object_id=warehouse.id,
access_control_list=[
{
"group_name": "customer-acme-analysts",
"permission_level": "CAN_USE"
}
]
)

Create cluster policy for customer:

# Create cluster policy with enforced tags
policy = w.cluster_policies.create(
name="customer-acme-policy",
definition="""{
"custom_tags.customer_id": {
"type": "fixed",
"value": "acme_corp",
"hidden": false
},
"custom_tags.environment": {
"type": "fixed",
"value": "production",
"hidden": true
},
"autotermination_minutes": {
"type": "fixed",
"value": 30
},
"node_type_id": {
"type": "allowlist",
"values": ["i3.xlarge", "i3.2xlarge"],
"defaultValue": "i3.xlarge"
}
}"""
)

# Assign policy to engineers group
w.permissions.update(
request_object_type="cluster-policies",
request_object_id=policy.policy_id,
access_control_list=[
{
"group_name": "customer-acme-engineers",
"permission_level": "CAN_USE"
}
]
)

Import template notebooks:

import io
from databricks.sdk.service.workspace import ImportFormat

# Import onboarding notebook for customer
w.workspace.upload(
path="/Workspace/Customers/Acme/Getting Started",
content=io.BytesIO(notebook_content.encode()),
format=ImportFormat.AUTO,
overwrite=True
)

# Grant notebook access to customer groups
w.permissions.update(
request_object_type="notebooks",
request_object_id="/Workspace/Customers/Acme/Getting Started",
access_control_list=[
{
"group_name": "customer-acme-analysts",
"permission_level": "CAN_READ"
},
{
"group_name": "customer-acme-engineers",
"permission_level": "CAN_EDIT"
}
]
)

User Onboarding Workflow

User onboarding should be lightweight and repeatable. If customer onboarding has been executed properly, adding a new user requires only identity provisioning and group assignment—no infrastructure or permission changes.

Identity Provider Integration

Users and service principals should be provisioned through the organization's identity provider (IdP). This ensures centralized authentication and that user lifecycle events are reflected automatically in Databricks access.

SCIM provisioning can automatically sync users and groups from enterprise identity systems like Azure AD, Okta, and OneLogin.

For automated workloads and CI/CD pipelines, service principals should represent applications or processes that need programmatic access. Service principals follow the same group-based access model as human users.

Group Assignment

The core of user onboarding is adding the user or service principal to the appropriate groups. This single action provisions all necessary access to data assets, compute resources, and platform features.

For example, a new data analyst might be added to groups such as:

  • analysts
  • production-data-readers
  • sql-warehouse-users

These group memberships collectively grant access to relevant catalogs and schemas, permission to use SQL warehouses, and any other capabilities defined for analysts. No additional configuration should be required.

Implementation Example: User and Service Principal Onboarding
note

The following examples demonstrate user provisioning patterns. Always verify syntax against the official Databricks documentation.

Create user and assign to groups:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create user (or provision via SCIM from IdP)
user = w.users.create(
user_name="jane.doe@acmecorp.com",
display_name="Jane Doe"
)

# Add user to persona-based groups
groups_to_join = [
"customer-acme-analysts",
"customer-acme-users"
]

for group_name in groups_to_join:
# Get group ID
group = w.groups.get(display_name=group_name)

# Add user to group
w.groups.patch(
id=group.id,
operations=[
{
"op": "add",
"path": "members",
"value": [{"value": user.id}]
}
]
)
print(f"Added {user.user_name} to {group_name}")

Create service principal for automation:

from databricks.sdk import AccountClient

# Create service principal for customer's ETL pipeline
sp = w.service_principals.create(
display_name="acme-etl-pipeline",
application_id="acme-etl-app"
)

# Add service principal to engineers group
engineers_group = w.groups.get(display_name="customer-acme-engineers")
w.groups.patch(
id=engineers_group.id,
operations=[
{
"op": "add",
"path": "members",
"value": [{"value": sp.id}]
}
]
)

# Generate OAuth secret for service principal authentication
# Note: Requires AccountClient with account-level credentials
a = AccountClient()
secret = a.service_principal_secrets.create(
service_principal_id=sp.id
)
print(f"Service Principal ID: {sp.application_id}")
print(f"Secret: {secret.secret}") # Store securely!

Bulk user provisioning:

# Onboard multiple users at once
new_users = [
{"email": "john.smith@acmecorp.com", "name": "John Smith", "groups": ["customer-acme-analysts"]},
{"email": "sarah.jones@acmecorp.com", "name": "Sarah Jones", "groups": ["customer-acme-engineers"]},
{"email": "mike.wilson@acmecorp.com", "name": "Mike Wilson", "groups": ["customer-acme-scientists"]}
]

for user_def in new_users:
# Create user
user = w.users.create(
user_name=user_def["email"],
display_name=user_def["name"]
)

# Assign to groups
for group_name in user_def["groups"]:
group = w.groups.get(display_name=group_name)
w.groups.patch(
id=group.id,
operations=[
{
"op": "add",
"path": "members",
"value": [{"value": user.id}]
}
]
)

print(f"✓ Onboarded {user.display_name}")

Verify user access:

# Check user's group memberships
user = w.users.get(user_id="jane.doe@acmecorp.com")
groups = w.groups.list(filter=f'members/value eq "{user.id}"')

print(f"User {user.display_name} is member of:")
for group in groups:
print(f" - {group.display_name}")

Onboarding Checklists

Customer Onboarding Checklist

Complete once during initial setup. Automate via Terraform, DABs, or Python SDK.

  • Define personas and create corresponding groups with appropriate permissions
  • Set up Unity Catalog hierarchy (catalogs, schemas, workspace bindings, row/column security)
  • Configure data sharing and ingestion (Delta Sharing, SFTP, Volumes)
  • Provision compute with attribution tags (SQL Warehouses, clusters, AI endpoints)
  • Create template notebooks and grant compute access to groups
  • Validate end-to-end access with a representative user from each persona

User Onboarding Checklist

Complete for each new user or service principal.

  • Provision identity in IdP and verify SCIM sync to Databricks
  • Add user/service principal to groups based on persona
  • Validate access and share onboarding documentation

Summary: Customer vs. User Onboarding

AspectCustomer OnboardingUser Onboarding
FrequencyOnce at setup; updates driven by business changesContinuous, as users join or leave
ScopeInfrastructure, data architecture, compute resourcesIdentity provisioning and group membership
Primary ActionsCreate groups, configure catalogs, provision compute, set up data sharingAdd user to IdP, assign to groups
Typical OwnerPlatform engineering, data platform teamIT operations, identity team, hiring managers
AutomationTerraform, DABs, Python SDKSCIM sync, API calls for group assignment
Impact of ChangesAffects all users; requires careful planning and testingAffects individual user; low risk, easily reversible

API & SDK Reference

ResourceDocumentation
Account API - Workspacesdocs.databricks.com/api/account/workspaces
Workspace API - Groupsdocs.databricks.com/api/workspace/groups
Workspace API - Usersdocs.databricks.com/api/workspace/users
Workspace API - Service Principalsdocs.databricks.com/api/workspace/serviceprincipals
Workspace API - Catalogsdocs.databricks.com/api/workspace/catalogs
Workspace API - Schemasdocs.databricks.com/api/workspace/schemas
Workspace API - Workspace Bindingsdocs.databricks.com/api/workspace/workspacebindings
Workspace API - Grantsdocs.databricks.com/api/workspace/grants
Workspace API - Warehousesdocs.databricks.com/api/workspace/warehouses
Workspace API - Cluster Policiesdocs.databricks.com/api/workspace/clusterpolicies
Databricks SDK for Pythondatabricks-sdk-py.readthedocs.io
SCIM Provisioningdocs.databricks.com/admin/users-groups/scim/

What's Next

  • Automation — Infrastructure as code for onboarding automation
  • Governance — Unity Catalog patterns for access control
  • Cost Management — Tagging during customer onboarding