Security
When you bring a Clean Rooms-based product to enterprise customers, your customer's security, privacy, and legal teams will conduct a review before approving any shared data environment. This page is designed to help you prepare for and pass those reviews — organized around the questions that come up most often.
For the underlying security model (no-trust architecture, Delta Sharing data flow, serverless isolation), see What is Databricks Clean Rooms?
Customer security review preparation
The questions below represent the eight areas that enterprise security teams consistently examine. For each area, key questions are listed alongside the information you need to provide.
1. Architecture, isolation, and trust model
- How are clean rooms isolated from each participant's production workspaces and data plane?
- What trust boundaries exist between provider, collaborator, and Databricks (central clean room account, no human access to production tenants)?
- Does enabling Clean Rooms require enabling external Delta Sharing, and if so, how do you prevent arbitrary external shares (egress risk)?
- Can a user who has access to two workspaces join data from both into the same clean room, and how is that controlled?
Clean Rooms operate in a central clean room (CCR), a Databricks-managed account that is entirely separate from both parties' production workspaces. Neither party has administrative access to the CCR, and no workloads from either party's workspace run in the CCR directly — jobs execute in ephemeral, isolated serverless compute that is torn down after each run.
Enabling Clean Rooms does require Delta Sharing to be enabled on the metastore. However, Clean Rooms control which tables are shared via which rooms: only assets explicitly added to a specific clean room by their owner are accessible within that room. No implicit cross-room or cross-party sharing occurs.
Unity Catalog enforces the privilege model: users can only add assets they own or have SELECT on to a clean room. A user with access to two workspaces cannot automatically join data from both — the assets must be explicitly added to the room from each workspace independently, and the clean room owner controls which assets are published.
2. Network and egress controls
- What network modes exist (full internet vs. restricted internet), and what guarantees does "restricted internet access" provide about data exfiltration?
- How do Secure Egress Gateway, NCC, and Private Link interact with Clean Rooms when input tables are in firewall- or Private Link-protected storage?
- Are there additional outbound calls during materialization (e.g., serverless NAT IPs) that must be allow-listed?
Clean Rooms support serverless egress controls, which can restrict outbound network access from the clean room compute environment. In restricted mode, the compute cannot make arbitrary outbound internet connections — this provides a meaningful barrier against data exfiltration via notebook code that tries to POST data to an external endpoint.
If a party's data is behind Private Link or a network firewall, the clean room's serverless compute plane must be able to reach it. This typically means either adding the serverless plane's stable NAT IPs to your allowlist, or configuring a Network Connectivity Configuration (NCC) to route clean room traffic through a Databricks-managed Private Endpoint. Your Databricks account team can provide the relevant regional IP ranges.
During Delta Sharing materialization, the serverless job pulls data files directly from the data owner's cloud storage using short-lived, scoped credentials — not through any Databricks-managed intermediary. The outbound calls are to the data owner's object storage (S3, ADLS, or GCS), which customers with network controls will need to allow from the serverless compute plane IPs.
3. Data residency, compliance, and certifications
- In which regions does the central clean room actually run, and how is data residency respected across clouds and regions?
- Do Clean Rooms inherit the workspace compliance security profile (HIPAA, FedRAMP Moderate, etc.)? Are there gaps per cloud?
- Are Clean Rooms explicitly in scope for SOC 2 / ISO / VAPT reports?
The central clean room runs in the cloud and region you specify at creation time. No data crosses regions unless you explicitly create a room whose CCR region differs from where your data is stored. For customers with strict data residency requirements (e.g., EU data sovereignty, India, FedRAMP), you must select a CCR region that satisfies both parties' requirements before creating the room. Discuss available regions with your Databricks account team, as Clean Rooms support varies by cloud and region.
Clean Rooms inherit the Databricks platform's compliance posture — SOC 1 and 2 Type II, ISO 27001/17/18/01, GDPR/CCPA, HIPAA on all three clouds, and FedRAMP Moderate on AWS. The serverless compute plane Clean Rooms run on is included in Databricks' compliance audit scope. For cloud-specific gaps (e.g., PCI on GCP or specific FedRAMP regions), refer customers to the Databricks Security & Trust Center and engage your Databricks account team for current coverage.
4. Identity, access control, and approvals
- How are permissions modeled for creating clean rooms, adding collaborators, attaching assets, and running notebooks?
- How does the notebook approval workflow work (auto-approval rules, runner vs. creator, hidden asset mode)?
- Can we separate data owners, governance teams, and analyst runners?
Clean Rooms use Unity Catalog's standard privilege model. Key privileges include CREATE CLEAN ROOM (at the metastore or catalog level), MODIFY CLEAN ROOM (required to add assets, approve notebooks), and MANAGE CLEAN ROOM (required to add/remove collaborators and configure auto-approval rules). You can grant these to different principals to achieve separation of duties — for example, a governance team holds MANAGE CLEAN ROOM while analysts receive only RUN CLEAN ROOM for pre-approved notebooks.
Every notebook must be approved by the other collaborator(s) before it can execute. The approval is version-locked via an etag: if the notebook is updated, re-approval is required before the new version can run. Auto-approval rules can streamline trusted, repeating workflows (for example, "auto-approve any notebook uploaded by partner X that is run by analyst group Y"), but all approvals — including automatic ones — are logged in the clean_room_events system table.
For partners, the "hidden asset" mode (sometimes called Codeless Clean Rooms) allows the provider to share notebooks whose source code is not visible to the collaborator. The collaborator can approve the behavior of the notebook (what inputs it requires, what outputs it produces) without seeing the implementation. This is the recommended pattern for protecting proprietary algorithms. See Approve a notebook for details.
5. Data protection, PII/PHI, and output controls
- Are customers allowed to place PII or PHI in clean rooms, and what controls exist to prevent re-identification or row-level leakage?
- What controls exist on outputs (minimum thresholds, aggregation policies, row-count suppression)?
- Can providers protect model weights and IP (encryption, private wheels/JARs, no code access)?
Databricks does not restrict what data types can be placed in clean rooms — customers can share PII and PHI tables, but Clean Rooms do not inherently de-identify or anonymize that data. The customer is responsible for ensuring that any PII or PHI they publish into a clean room is appropriate under their data sharing agreements and privacy policy. For partners: the safest posture is to design your data model so you never need the customer to share raw PII at all (see Architecture — Data modeling for the derived features pattern).
Output controls are the notebook author's responsibility: Clean Rooms does not automatically enforce k-anonymity, differential privacy, or minimum thresholds. If your product makes privacy guarantees (e.g., "results are suppressed for groups smaller than N"), those guarantees must be encoded in the notebook logic itself. Document and version those controls explicitly, because they become part of the security narrative you present to customers.
Model IP and proprietary code can be protected by compiling logic into a JAR or Python wheel, storing it in a UC volume, and publishing the volume into the clean room. The other party can reference and run the compiled library without seeing its source. See IP protection patterns below.
6. Multi-party and non-Databricks collaborators
- How do you onboard partners who don't have Databricks?
- Can Clean Rooms be used within the same account (multi-BU scenarios), and what risk does that create vs. cross-metastore designs?
Collaborators who don't have a Databricks workspace can participate via a Databricks Express account, which provides limited, cost-effective access to accept clean room invitations and run pre-approved notebooks. For partners who want to fully manage the experience on behalf of a customer who has no Databricks presence, the clean room creator can structure the room so the customer's data is ingested via a managed Delta Sharing connection while the provider controls the execution environment.
Same-account clean rooms (where both parties are business units in the same Databricks account) are supported but reduce the isolation guarantee: both parties share the same administrative control plane. For multi-BU scenarios where the point of the clean room is to enforce data separation between internal teams, cross-metastore designs are more defensible — each BU has its own metastore, and the clean room enforces the boundary.
7. Logging, monitoring, and incident response
- What audit logs exist for clean room operations (asset additions, notebook runs, output table creation, approval changes)?
- How are security tests run (deletion probes, notebook probes)?
All clean room operations are captured in the clean_room_events system table — who added which asset, which notebook version ran, who triggered the run, who approved it, and when each output table was written. Account-level audit logs capture clean room activity under the clean-room service, with Unity Catalog changes appearing under unityCatalog. Both log streams are available for SIEM integration.
Databricks runs ongoing security probes (notebook probes, deletion/creation probes) against the serverless compute environment used by Clean Rooms. These probes test for data persistence across workloads, unauthorized access patterns, and unexpected exfiltration paths. They are part of Databricks' internal security operations and are not customer-configurable. For customer-side monitoring, Databricks recommends alerting on unexpected auto-approvals and reviewing etag history for unauthorized notebook updates.
8. Serverless compute, "no human access," and egress
- What hardening exists for serverless compute (disk wiping, GPU wiping, no human access to central tenants, workload segregation)?
- How do serverless egress controls interact with clean room jobs and Delta Sharing materialization?
- Are there any scenarios where Databricks employees could access customer data inside clean room central tenants?
Serverless compute used by Clean Rooms runs in ephemeral VMs that are provisioned for each job and immediately deprovisioned after completion. Disks (including GPUs where applicable) are securely wiped between workloads. Serverless compute for Clean Rooms is isolated from general-purpose serverless compute — workloads do not run on shared infrastructure with non-clean-room jobs.
The central clean room account is managed by Databricks and has no public human access. Databricks employees access production infrastructure only through approved, audited break-glass procedures with time-limited credentials, not through persistent admin accounts. These access events are logged and monitored. Customer data flowing through a clean room (table data fetched from the data owner's storage via Delta Sharing) is never persisted in the central clean room account beyond the scope of a single job run.
Serverless egress controls can be applied at the clean room level to restrict outbound network access from clean room jobs. When configured, the compute plane can only reach endpoints you explicitly allow — cloud storage endpoints for Delta Sharing materialization, and nothing else unless you explicitly permit it. See serverless egress control for configuration details.
Managing notebook approvals at scale
If your product ships notebooks to many customers simultaneously — or if you maintain multiple clean rooms per customer (dev, staging, prod) — managing approvals manually becomes unworkable. A few patterns for handling approvals at scale:
Auto-approval rules with scope limits. Configure auto-approval rules to cover specific publishers and runners rather than using a blanket "approve everything" rule. A rule like "auto-approve notebooks published by service principal partner-sa@example.com and runnable by group analysts" gives you velocity without opening unlimited approval. Scope rules as narrowly as your operational tempo allows.
Etag change management. Each time you update a notebook, a new approval is required in every room where the notebook is shared. If you have dozens of rooms, uncoordinated notebook updates create a surge of pending approvals. Manage notebook versions deliberately: use a staging room to test new versions, approve across production rooms in a coordinated window, and communicate version changes to customer operators before updating. Treat notebook version upgrades the same way you would treat a software deployment to production.
Separating reviewer and runner. The party who reviews and approves a notebook does not have to be the same person who runs it. Structure your customer onboarding so that a governance or security contact approves notebooks during setup, and operational users only interact with the "Run" button on already-approved versions. This separation reduces the cognitive burden on security reviewers and makes approvals repeatable.
See Approve a notebook for the full workflow.
IP protection for partners
Partners building solutions on top of Clean Rooms frequently ask: "Will customers see my schemas, notebooks, or model code? Can they reverse-engineer my algorithms from logs?"
What collaborators can see by default
| Asset | Visible to other parties? |
|---|---|
| Table / view data rows | No |
| Column names and types (aliases) | Yes |
| Notebook source code | Yes — required for approval |
| Volume contents (JAR, wheel, model files) | No |
| Job run history (who ran what, when) | Yes |
| Clean room name, cloud, region | Yes |
| Secret values (tokens, keys) | No |
IP protection patterns
1. Use hidden asset mode for an API-like experience
Configure the clean room so recipients see only parameter fields (required input tables) and outputs — not your schemas or notebook source. Your logic runs invisibly. This is the recommended pattern for any notebook containing proprietary scoring, ranking, or matching logic.
2. Package algorithms and models as JARs or wheels
Compile proprietary logic into a JAR or Python wheel, store it in a UC volume, and share the volume into the clean room. Recipients see only UDF names or high-level function calls — not the implementation.
3. Limit outputs to safe aggregates or scores
Define explicit output rules: no row-level results that would let customers reconstruct models or training data. Return only scores, classifications, segment memberships, or aggregated metrics.
4. Anti-patterns to avoid
- Do not assume Clean Rooms anonymize or de-identify your data — it is still your responsibility to share minimally necessary or masked datasets
- Do not use Clean Rooms for simple file delivery where Delta Sharing alone is sufficient — the added complexity is not warranted
- Do not share notebook source that contains credentials, proprietary constants, or implementation details you want to protect — notebooks are visible to collaborators during the approval process unless you use hidden asset mode
Compliance certifications
Databricks Clean Rooms inherits the Databricks platform's compliance posture:
| Certification | Status |
|---|---|
| SOC 1 and 2 Type II | ✓ |
| SOC 3 | ✓ |
| ISO 27001:2013 | ✓ |
| ISO 27017:2015 | ✓ |
| ISO 27018:2019 | ✓ |
| ISO 27701:2019 | ✓ |
| GDPR / CCPA | ✓ |
| HIPAA (AWS, Azure, GCP) | ✓ |
| FedRAMP Moderate (AWS) | ✓ |
For full details, see the Databricks Security & Trust Center.
What's next
- Review architecture for partner integration patterns, data modeling, and output design
- Explore use cases and industry-specific patterns
- See Create clean rooms to get started