Customer and User Onboarding
This section provides guidance on implementing a structured onboarding framework. Effective onboarding is foundational to realizing value from the governance, isolation, tagging, and automation capabilities established in preceding sections.
A well-designed onboarding process ensures that new customers and users can quickly and securely access the resources they need while maintaining compliance with organizational policies.
Reference implementation: See how Firefly automates organization onboarding and user onboarding.
Design Principles
Separation of Customer and User Workflows
The framework separates onboarding into two distinct workflows: customer onboarding and user onboarding. This separation reflects a core design principle that these processes operate on different timescales and serve different purposes.
| Workflow | Timing | Purpose | Automation |
|---|---|---|---|
| Customer onboarding | Once at setup | Establish foundational infrastructure | Terraform, DABs, Python SDK |
| User onboarding | Continuous | Identity provisioning and access | SCIM sync, API calls |
This separation:
- Reduces operational risk by ensuring that adding or removing users does not require infrastructure changes
- Enables different teams to own different parts of the process
- Creates a more auditable environment where infrastructure changes are tracked separately from user access changes
Group-Based Access Management
User access should be managed entirely through group membership. In a well-designed environment, onboarding a new user requires nothing more than adding that user to the appropriate groups.
All permissions for data access, compute resources, and platform features should be pre-configured at the group level. This approach:
- Simplifies user onboarding
- Ensures consistency across users with similar roles
- Makes access auditing straightforward
Groups should be defined by persona or role rather than by individual resource. For example, rather than having separate groups for each catalog a data analyst might need, define a Data Analyst group that has been granted access to all relevant catalogs, schemas, and compute resources for that role.
Customer Onboarding Workflow
Customer onboarding establishes the foundational infrastructure that will support all users and workloads. This process should be executed once during initial setup, with subsequent modifications driven only by meaningful changes to business requirements.
All customer onboarding steps should be codified and automated using Terraform, DABs, or Python SDK to ensure reproducibility.
Group Structure and Persona Definition
Define the groups that will govern access throughout the environment. Groups should map to logical personas:
- Data engineers
- Data analysts
- Data scientists
- Business intelligence users
- Administrators
Each persona should have a corresponding group in Unity Catalog. Access management should be performed exclusively through groups rather than direct user grants.
See Unity Catalog privileges and Grant, deny, and revoke privileges for implementation details.
Implementation Example: Create Groups and Assign Permissions
The following examples demonstrate onboarding patterns. Always verify API syntax against the official Databricks documentation.
Create groups using Python SDK:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Create persona-based groups
groups = [
{"name": "customer-acme-analysts", "display_name": "Acme Corp Data Analysts"},
{"name": "customer-acme-engineers", "display_name": "Acme Corp Data Engineers"},
{"name": "customer-acme-scientists", "display_name": "Acme Corp Data Scientists"},
{"name": "customer-acme-admins", "display_name": "Acme Corp Administrators"}
]
for group_def in groups:
group = w.groups.create(
display_name=group_def["display_name"]
)
print(f"Created group: {group.display_name} (ID: {group.id})")
Grant catalog access to groups:
-- Data Analysts: Read-only access to production data
GRANT USE CATALOG ON CATALOG `customer_acme` TO `customer-acme-analysts`;
GRANT USE SCHEMA ON SCHEMA `customer_acme`.`sales` TO `customer-acme-analysts`;
GRANT SELECT ON SCHEMA `customer_acme`.`sales` TO `customer-acme-analysts`;
-- Data Engineers: Full access to manage ETL pipelines
GRANT USE CATALOG ON CATALOG `customer_acme` TO `customer-acme-engineers`;
GRANT CREATE SCHEMA, USE SCHEMA, CREATE TABLE ON CATALOG `customer_acme` TO `customer-acme-engineers`;
-- Admins: Full control
GRANT ALL PRIVILEGES ON CATALOG `customer_acme` TO `customer-acme-admins`;
Data Strategy
Establish the data architecture within Unity Catalog:
| Component | Guidance |
|---|---|
| Catalog and schema hierarchy | Use Unity Catalog's three-level namespace to reflect organizational structure, data domains, or access patterns. See Unity Catalog object model. |
| Workspace bindings | Control which workspaces can see specific catalogs or schemas. See workspace bindings. |
| Fine-grained access | Implement row filters and column masks where needed. |
| Delta Sharing | Configure shares and recipients if data sharing across organizational boundaries is required. |
| Data ingestion | Establish methods based on data source, update frequency, and security requirements (Delta Sharing, SFTP, Unity Catalog Volumes). |
Implementation Example: Unity Catalog Setup
The following examples demonstrate Unity Catalog provisioning. Always verify syntax against the official Databricks documentation.
Create catalog and schemas using Python SDK:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Create customer catalog
catalog = w.catalogs.create(
name="customer_acme",
comment="Acme Corp customer catalog"
)
# Create schemas by data domain
schemas = ["sales", "marketing", "operations", "analytics"]
for schema_name in schemas:
schema = w.schemas.create(
name=schema_name,
catalog_name="customer_acme",
comment=f"Acme Corp {schema_name} data"
)
print(f"Created schema: {catalog.name}.{schema.name}")
Bind catalog to customer workspace:
# Bind catalog to specific workspace (isolate customer data)
w.workspace_bindings.update(
name="customer_acme",
assign_workspaces=[workspace_id],
unassign_workspaces=[]
)
Set up external location for customer data:
# Create external location for customer's cloud storage
external_location = w.external_locations.create(
name="customer_acme_raw_data",
url="s3://acme-databricks-data/raw/",
credential_name="aws-acme-credential",
comment="Acme Corp raw data location"
)
# Grant access to data engineers
w.grants.update(
securable_type="EXTERNAL_LOCATION",
full_name="customer_acme_raw_data",
changes=[
{
"principal": "customer-acme-engineers",
"add": ["CREATE_EXTERNAL_TABLE", "READ_FILES", "WRITE_FILES"]
}
]
)
Pre-Configured Compute Resources
Provision compute resources with appropriate tagging for cost attribution before users are onboarded:
| Resource | Configuration |
|---|---|
| SQL Warehouses | Provision for BI and SQL analytics workloads with sizing based on expected query patterns. Apply customer attribution tags and grant access to appropriate groups. |
| Clusters | Configure with policies that enforce tagging requirements and resource limits. Add custom tags that flow through to billing system tables. |
| AI Agents and Model Serving | Configure endpoints with appropriate tags and access controls if AI capabilities will be used. |
| Pre-Populated Notebooks | Create template notebooks demonstrating common patterns, library imports, and connections to customer data assets. |
Implementation Example: Provision Compute Resources
The following examples demonstrate compute provisioning patterns. Always verify syntax against the official Databricks documentation.
Create SQL warehouse with customer tags:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Create SQL warehouse for customer
warehouse = w.warehouses.create(
name="customer-acme-analytics",
cluster_size="Medium",
min_num_clusters=1,
max_num_clusters=3,
auto_stop_mins=15,
enable_serverless_compute=True,
tags={
"custom_tags": [
{"key": "customer_id", "value": "acme_corp"},
{"key": "environment", "value": "production"},
{"key": "service", "value": "analytics"}
]
}
)
# Grant warehouse access to analysts group
w.permissions.update(
request_object_type="warehouses",
request_object_id=warehouse.id,
access_control_list=[
{
"group_name": "customer-acme-analysts",
"permission_level": "CAN_USE"
}
]
)
Create cluster policy for customer:
# Create cluster policy with enforced tags
policy = w.cluster_policies.create(
name="customer-acme-policy",
definition="""{
"custom_tags.customer_id": {
"type": "fixed",
"value": "acme_corp",
"hidden": false
},
"custom_tags.environment": {
"type": "fixed",
"value": "production",
"hidden": true
},
"autotermination_minutes": {
"type": "fixed",
"value": 30
},
"node_type_id": {
"type": "allowlist",
"values": ["i3.xlarge", "i3.2xlarge"],
"defaultValue": "i3.xlarge"
}
}"""
)
# Assign policy to engineers group
w.permissions.update(
request_object_type="cluster-policies",
request_object_id=policy.policy_id,
access_control_list=[
{
"group_name": "customer-acme-engineers",
"permission_level": "CAN_USE"
}
]
)
Import template notebooks:
import io
from databricks.sdk.service.workspace import ImportFormat
# Import onboarding notebook for customer
w.workspace.upload(
path="/Workspace/Customers/Acme/Getting Started",
content=io.BytesIO(notebook_content.encode()),
format=ImportFormat.AUTO,
overwrite=True
)
# Grant notebook access to customer groups
w.permissions.update(
request_object_type="notebooks",
request_object_id="/Workspace/Customers/Acme/Getting Started",
access_control_list=[
{
"group_name": "customer-acme-analysts",
"permission_level": "CAN_READ"
},
{
"group_name": "customer-acme-engineers",
"permission_level": "CAN_EDIT"
}
]
)
User Onboarding Workflow
User onboarding should be lightweight and repeatable. If customer onboarding has been executed properly, adding a new user requires only identity provisioning and group assignment—no infrastructure or permission changes.
Identity Provider Integration
Users and service principals should be provisioned through the organization's identity provider (IdP). This ensures centralized authentication and that user lifecycle events are reflected automatically in Databricks access.
SCIM provisioning can automatically sync users and groups from enterprise identity systems like Azure AD, Okta, and OneLogin.
For automated workloads and CI/CD pipelines, service principals should represent applications or processes that need programmatic access. Service principals follow the same group-based access model as human users.
Group Assignment
The core of user onboarding is adding the user or service principal to the appropriate groups. This single action provisions all necessary access to data assets, compute resources, and platform features.
For example, a new data analyst might be added to groups such as:
analystsproduction-data-readerssql-warehouse-users
These group memberships collectively grant access to relevant catalogs and schemas, permission to use SQL warehouses, and any other capabilities defined for analysts. No additional configuration should be required.
Implementation Example: User and Service Principal Onboarding
The following examples demonstrate user provisioning patterns. Always verify syntax against the official Databricks documentation.
Create user and assign to groups:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Create user (or provision via SCIM from IdP)
user = w.users.create(
user_name="jane.doe@acmecorp.com",
display_name="Jane Doe"
)
# Add user to persona-based groups
groups_to_join = [
"customer-acme-analysts",
"customer-acme-users"
]
for group_name in groups_to_join:
# Get group ID
group = w.groups.get(display_name=group_name)
# Add user to group
w.groups.patch(
id=group.id,
operations=[
{
"op": "add",
"path": "members",
"value": [{"value": user.id}]
}
]
)
print(f"Added {user.user_name} to {group_name}")
Create service principal for automation:
from databricks.sdk import AccountClient
# Create service principal for customer's ETL pipeline
sp = w.service_principals.create(
display_name="acme-etl-pipeline",
application_id="acme-etl-app"
)
# Add service principal to engineers group
engineers_group = w.groups.get(display_name="customer-acme-engineers")
w.groups.patch(
id=engineers_group.id,
operations=[
{
"op": "add",
"path": "members",
"value": [{"value": sp.id}]
}
]
)
# Generate OAuth secret for service principal authentication
# Note: Requires AccountClient with account-level credentials
a = AccountClient()
secret = a.service_principal_secrets.create(
service_principal_id=sp.id
)
print(f"Service Principal ID: {sp.application_id}")
print(f"Secret: {secret.secret}") # Store securely!
Bulk user provisioning:
# Onboard multiple users at once
new_users = [
{"email": "john.smith@acmecorp.com", "name": "John Smith", "groups": ["customer-acme-analysts"]},
{"email": "sarah.jones@acmecorp.com", "name": "Sarah Jones", "groups": ["customer-acme-engineers"]},
{"email": "mike.wilson@acmecorp.com", "name": "Mike Wilson", "groups": ["customer-acme-scientists"]}
]
for user_def in new_users:
# Create user
user = w.users.create(
user_name=user_def["email"],
display_name=user_def["name"]
)
# Assign to groups
for group_name in user_def["groups"]:
group = w.groups.get(display_name=group_name)
w.groups.patch(
id=group.id,
operations=[
{
"op": "add",
"path": "members",
"value": [{"value": user.id}]
}
]
)
print(f"✓ Onboarded {user.display_name}")
Verify user access:
# Check user's group memberships
user = w.users.get(user_id="jane.doe@acmecorp.com")
groups = w.groups.list(filter=f'members/value eq "{user.id}"')
print(f"User {user.display_name} is member of:")
for group in groups:
print(f" - {group.display_name}")
Onboarding Checklists
Customer Onboarding Checklist
Complete once during initial setup. Automate via Terraform, DABs, or Python SDK.
- Define personas and create corresponding groups with appropriate permissions
- Set up Unity Catalog hierarchy (catalogs, schemas, workspace bindings, row/column security)
- Configure data sharing and ingestion (Delta Sharing, SFTP, Volumes)
- Provision compute with attribution tags (SQL Warehouses, clusters, AI endpoints)
- Create template notebooks and grant compute access to groups
- Validate end-to-end access with a representative user from each persona
User Onboarding Checklist
Complete for each new user or service principal.
- Provision identity in IdP and verify SCIM sync to Databricks
- Add user/service principal to groups based on persona
- Validate access and share onboarding documentation
Summary: Customer vs. User Onboarding
| Aspect | Customer Onboarding | User Onboarding |
|---|---|---|
| Frequency | Once at setup; updates driven by business changes | Continuous, as users join or leave |
| Scope | Infrastructure, data architecture, compute resources | Identity provisioning and group membership |
| Primary Actions | Create groups, configure catalogs, provision compute, set up data sharing | Add user to IdP, assign to groups |
| Typical Owner | Platform engineering, data platform team | IT operations, identity team, hiring managers |
| Automation | Terraform, DABs, Python SDK | SCIM sync, API calls for group assignment |
| Impact of Changes | Affects all users; requires careful planning and testing | Affects individual user; low risk, easily reversible |
API & SDK Reference
| Resource | Documentation |
|---|---|
| Account API - Workspaces | docs.databricks.com/api/account/workspaces |
| Workspace API - Groups | docs.databricks.com/api/workspace/groups |
| Workspace API - Users | docs.databricks.com/api/workspace/users |
| Workspace API - Service Principals | docs.databricks.com/api/workspace/serviceprincipals |
| Workspace API - Catalogs | docs.databricks.com/api/workspace/catalogs |
| Workspace API - Schemas | docs.databricks.com/api/workspace/schemas |
| Workspace API - Workspace Bindings | docs.databricks.com/api/workspace/workspacebindings |
| Workspace API - Grants | docs.databricks.com/api/workspace/grants |
| Workspace API - Warehouses | docs.databricks.com/api/workspace/warehouses |
| Workspace API - Cluster Policies | docs.databricks.com/api/workspace/clusterpolicies |
| Databricks SDK for Python | databricks-sdk-py.readthedocs.io |
| SCIM Provisioning | docs.databricks.com/admin/users-groups/scim/ |
What's Next
- Automation — Infrastructure as code for onboarding automation
- Governance — Unity Catalog patterns for access control
- Cost Management — Tagging during customer onboarding