Use cases

Databricks Clean Rooms unlock a wide range of collaborative analytics and AI use cases across industries. The common thread: multiple parties need to derive joint insights from sensitive data without exposing it to each other.

Use case overview

Industry	Use case
Advertising & Retail	Audience segmentation & targeting, measurement & attribution, lookalike modeling
Financial Services	Fraud detection & prevention, regulatory risk & compliance, targeted product development
Healthcare & Life Sciences	Population health research, drug discovery, genomic target identification, ML on EHR data
Cross-Industry	Identity resolution, cross-organizational ML model training

Advertising & retail media

Partner roles in adtech

As the partner (measurement firm, identity provider, or data provider): You own the clean room, publish the notebooks, and protect your matching logic or attribution methodology as IP. You are Party 1.

As the customer (brand, retailer, or broadcaster): You publish your first-party data (purchase records, CRM, viewership) into the room. You review and run the partner's notebooks. You receive the output tables. You are Party 2.

Reference architecture: measurement partner integration

What the partner owns: The clean room, the attribution notebook, the scoring library. The customer never sees the notebook source or library contents — only the input parameters and output schema.

What the customer owns: Their conversion and sales data, and the hashed key table used for joining. They review the notebook's declared behavior (inputs, outputs) and approve it before running.

What leaves the room: Campaign performance scores, ROAS by segment, audience reach metrics — aggregated results only, no row-level data from either party.

Audience segmentation and targeting

Retailers and media companies collaborate with brands and advertisers to build audience segments from combined first-party data — without either party exposing their customer lists.

Pattern:

Retailer shares purchase transaction data (as tables) into the clean room
Brand shares CRM data (as tables) into the clean room
Approved notebook performs audience matching and segment creation
Output table contains anonymized segment IDs, not raw PII

Try-before-you-buy evaluation

Data providers can stand up a Clean Room where prospects explore a sample or production subset of their data under strict privacy rules — validating schema, join logic, and business value before any data is exported or a contract is signed. When the prospect is ready, they graduate to a OpenSharing subscription for ongoing delivery.

This pattern is especially useful for premium data providers, identity resolution vendors demonstrating match rates, and analytics firms showing attribution methodology before a deal closes.

Measurement and attribution

Ad platforms and publishers can run joint attribution models using impression data and conversion data without sharing the underlying datasets.

A streaming TV provider can give broadcasters access to viewership and ad impression data through an approved notebook — broadcasters get attribution insights, but never access the raw audience data directly. This enables advertisers to optimize campaigns with full data fidelity while the provider retains full privacy control.

Identity providers and measurement firms can own the notebooks and offer this as a recurring subscription service across multiple brands and retailers.

Lookalike modeling

Parties can collaboratively train lookalike models on combined customer data, with the model output (not the training data) as the clean room result.

Financial services

Partner roles in financial services

As the partner (data/analytics provider, fintech, or consulting firm): You typically publish the notebooks, the scoring methodology, and optionally a reference dataset (e.g., fraud signals, credit features). You are Party 1.

As the customer (bank, insurer, or fintech): You publish regulated transaction data, account records, or customer features. You run the partner's pre-approved notebooks and receive the output flags or scores. You are Party 2.

In consortium models (e.g., multi-bank fraud rings), each institution is a co-equal party. The partner may act as the neutral operator who provisions and manages the room.

Fraud detection and prevention

Financial institutions can combine transaction signals across organizations to train fraud detection models on shared behavioral patterns without exposing customer account data. Clean Rooms enable multi-cloud collaboration using approved notebooks, allowing organizations to standardize the landing zone for external data while meeting unique privacy requirements.

Consortiums of multiple financial institutions can run joint fraud detection across all parties, with each institution keeping regulated data in its own metastore. The Clean Room performs hashed joins and modeling according to strict policies — only flags and scores leave, not raw records.

Secure partner matching

A common pattern for co-branded card programs and fintech partnerships:

Each party hashes or HMACs shared identifiers (e.g., hashed email, device ID) using the same algorithm
Each party shares only hashed keys and derived features — not raw PII — into the Clean Room
The Clean Room runs approved notebooks that join on hashed keys, apply eligibility or risk logic, and emit only offer flags or scores back to each side
Each party receives only their portion of the output — no raw data from the other party crosses the boundary

Data and analytics providers can plug their datasets into this pattern as a neutral third party, and consulting partners can package it as a reusable collaboration workflow.

Regulatory risk and compliance

Regulated institutions can run joint compliance checks, risk models, or stress tests against combined datasets while maintaining strict data residency and access controls required by regulators.

Secure lending partner collaboration

Financial technology companies can share customer financial signals with lending partners to improve loan decisioning without exposing raw PII. Clean Rooms allow fintech providers to enforce privacy controls while seamlessly integrating with partners regardless of their platform or cloud provider.

Healthcare and life sciences

Partner roles in healthcare

As the partner (ML firm, ISV, or CRO): You develop the training notebooks and package your algorithms as private Python wheels. You publish the notebook and library into the clean room but never access the underlying health records. You are the ML Expert.

As the customer (health system, payer, or pharma company): You own and govern the sensitive data (EHR, claims, genomics). You create the clean room, review the partner's notebook, and run it against your data. The model or output stays in your metastore. You are the Data Owner.

ML on electronic health records

Healthcare organizations can train machine learning models on sensitive EHR data without the data science team ever directly accessing the underlying records.

Pattern (two-actor model):

Actor	Role
Data Owner	Governs EHR data, publishes tables to the clean room, runs the notebook
ML Expert	Develops the training code as a private Python library, publishes the library as a volume, publishes the notebook

Step-by-step:

Data Owner creates the clean room, invites the ML Expert using their sharing identifier
Data Owner publishes raw EHR tables — ML Expert can see column metadata, not data
ML Expert packages training code as a Python wheel, publishes it as a volume
ML Expert publishes a notebook that uses the private library and outputs a trained model
Data Owner reviews and runs the notebook — the model is the output, not the raw data
ML Expert can update the library at any time; each update requires a new round of review

This pattern supports readmission prediction, patient outcome classification, and other sensitive clinical models.

Drug discovery and genomics

Pharmaceutical companies and CROs (contract research organizations) can collaborate on clinical trial data, genomic datasets, and observational research studies across organizational boundaries — enabling multi-party clinical analysis while maintaining strict data partitioning.

Population health research

Public health agencies and healthcare systems can run joint population health analyses across combined patient datasets without creating a centralized repository of protected health information.

Identity resolution

Partner roles in identity resolution

As the partner (identity provider): You own the identity graph, the matching notebooks, and the enrichment logic. You publish your graph as tables and your matching algorithm as a notebook (optionally hidden). You are Party 1.

As the customer (brand, retailer, or publisher): You publish a key table (hashed emails, MAIDs, device IDs) and optionally a seed audience. You run the partner's notebook. You receive enriched segment IDs or match rates as output — never the partner's raw graph. You are Party 2.

Identity resolution is one of the most common Clean Rooms use cases across industries. When joining disparate data assets, organizations need to match entities across datasets (e.g., matching an advertiser's customer list with a publisher's user graph) without sharing raw PII.

The challenge without Clean Rooms: Traditional approaches require sharing PII directly with an identity provider — creating privacy risk, compliance exposure, and dependency on third-party data movement.

With Clean Rooms:

Key benefits:

No raw PII exposure to the identity provider
No data movement to a third-party system
Scalable — works across large datasets using Databricks serverless compute
Auditable — all matching logic is captured in the approval workflow

Common identity resolution patterns

Graph extension / enrichment: A customer uploads a key table (emails, device IDs, MAIDs). The identity provider's notebook enriches with graph attributes — demographics, affinities, segment memberships — and returns only enhanced IDs or segment flags, not graph internals.

Audience overlap and lookalike: The customer passes a seed audience table. The provider computes overlap with graph segments and generates lookalike segment IDs. Only segment IDs or reach/frequency aggregates are returned — no raw cross-party data.

Multi-brand / retail media hub: The identity provider acts as a neutral hub connecting brand and retailer datasets, applying shared identifiers and audiences across multiple parties in a single room.

Identity providers can package these patterns using a Codeless Clean Room approach so customers see a simple guided interface — not schemas or notebook code. Leading identity resolution providers offer these capabilities natively through Databricks Clean Rooms.

Productization patterns

Most partners start with a one-off clean room for a single customer. Turning that into a scalable, repeatable product requires deliberate design from the beginning.

Templating notebooks across customers

Your core business logic — a scoring model, a matching algorithm, an attribution framework — is the same across every customer. What changes is which tables the customer brings, what they name their columns, and what outputs they need. Design your notebooks from day one to accept parameterized inputs:

Use notebook parameters for table names, schema names, and column mappings rather than hardcoding them
Document the expected input schema as a contract — column names, data types, join key format — so customers know exactly what to prepare
Keep a versioned notebook library internally: v1.2.0 of your attribution notebook is what you deploy to all active rooms, and an upgrade is a deliberate release event, not a silent file change

This approach lets you manage dozens of active rooms without maintaining dozens of divergent notebook variants.

Managing a fleet of clean rooms

When you have many customers, each in their own clean room, managing them manually through the UI becomes unworkable. Build operational tooling from the start:

Use the Databricks REST API or UI to create rooms, attach assets, and manage approvals — not manual one-off setup per customer
Maintain a registry (a simple database table or config file) mapping customer ID → clean room name → current notebook etag → last run timestamp
Build a deployment script that, when you release a new notebook version, iterates over your registry and publishes the update to all active rooms. Notify customer operators that re-approval is needed before the new version can run
Set up monitoring on clean_room_events for each room — failed runs, unexpected auto-approvals, or long periods without any run activity are all signals worth alerting on

Handling notebook version upgrades at scale

Every notebook update in Clean Rooms requires a new approval from the other party before it can run. In a one-customer context this is manageable. Across fifty customers it requires coordination.

A workable process:

Release candidate notebook is tested in your internal dry-run room
A release communication goes to customer operators explaining what changed and what they need to re-approve
You push the new notebook version to all rooms via the REST API or by updating each room through the UI
Customers approve on their timeline — old versions remain runnable until they upgrade
After a cutover window, you deprecate support for older versions and confirm all rooms are on the current etag

Treat this identically to how you manage API version deprecations — with advance notice, a transition window, and a hard cutover date.

Multi-tenant billing and cost allocation

Serverless compute used by Clean Rooms is billed to whoever triggers the notebook run. In the standard model, the customer runs the notebook — compute cost lands in their Databricks account. If you operate rooms where your service principal triggers runs on behalf of customers, the cost lands in your account.

Think through your pricing model before your first customer conversation:

Pass-through: Customers run their own notebooks; they absorb compute cost directly. Your pricing covers the value of your data and IP, not compute
Bundled: You run notebooks on behalf of customers; you absorb compute cost and build it into your subscription pricing. Simpler for customers, but requires you to model and cap your compute exposure
Hybrid: Customers run standard workloads; you trigger premium or high-volume runs as a managed service add-on

Estimate expected run frequency and data volumes upfront. Serverless compute for Clean Rooms is billed per DBU; a room running hourly attribution jobs on large tables has meaningfully different economics than a room running a weekly match job on a small key table.

What's next

Understand the architecture before building your first clean room
Review the security and IP protection model to prepare for discussions with collaborators
See Create clean rooms for hands-on setup
Read the privacy-centric ML blog post for a detailed walkthrough of the EHR use case

Use case overview​

Advertising & retail media​

Reference architecture: measurement partner integration​

Audience segmentation and targeting​

Try-before-you-buy evaluation​

Measurement and attribution​

Lookalike modeling​

Financial services​

Fraud detection and prevention​

Secure partner matching​

Regulatory risk and compliance​

Secure lending partner collaboration​

Healthcare and life sciences​

ML on electronic health records​

Drug discovery and genomics​

Population health research​

Identity resolution​

Common identity resolution patterns​

Productization patterns​

Templating notebooks across customers​

Managing a fleet of clean rooms​

Handling notebook version upgrades at scale​

Multi-tenant billing and cost allocation​

What's next​