Skip to main content

Built-On Databricks Architecture Statement of Work

Partner Name: [Company Name]
Product Name: [Product/Solution Name]
Document Version: [1.0]
Date: [YYYY-MM-DD]
Author(s): [Technical Lead Name(s)]
Databricks Sponsor: [Partner Development Manager]
Databricks SA: [Solutions Architect Name]


Document Purpose

This Statement of Work (SOW) serves as the technical blueprint for your Built-On Databricks solution. It should be completed before implementation begins to ensure alignment with Databricks best practices, Partner Well-Architected Framework (PWAF) guidelines, and to identify potential risks early.

Review Process:

  1. Partner completes this document with proposed architecture
  2. Databricks Built-On team reviews and provides feedback
  3. Joint architecture review session to address gaps/questions
  4. Document approval before implementation begins
  5. Quarterly reviews during build phase

Executive Summary

Business Overview

Problem Statement:
[Describe the business problem your solution solves in 2-3 sentences]

Target Market:

  • Industry/Vertical: [e.g., Retail/CPG, Healthcare, Financial Services]
  • Customer Profile: [e.g., Mid-market retailers with $50M-500M revenue]
  • Geographic Focus: [e.g., North America, EMEA, Global]

Value Proposition:
[What unique value does your solution deliver? How does Databricks enable it?]

Product Overview

Product Name: [Name]
Product Type: [e.g., AI-powered demand forecasting, Supply chain optimization, Customer analytics platform]
Go-Live Target: [YYYY-MM-DD]
Initial Customer Target: [# of customers at GA]

Key Features:

  1. [Feature 1]
  2. [Feature 2]
  3. [Feature 3]

Business Metrics

MetricCurrent State6-Month Target12-Month Target
Customers[#][#][#]
Est. Monthly DBUs[#][#][#]
Est. Annual DBU Spend[$][$][$]
Workspaces[#][#][#]

1. Business Requirements

1.1 Customer Segments

Primary Customer Segment:
[Describe your primary target customer]

Expected Usage Patterns:

  • Data Volume per Customer: [TB/customer/month]
  • Query Frequency: [queries/day or jobs/day]
  • Concurrent Users: [# of concurrent users per customer]
  • Peak Usage Times: [Time zones, business hours, batch windows]

1.2 Pricing & Packaging

Pricing Model:

  • Subscription (flat fee)
  • Usage-based (consumption)
  • Hybrid (base + overage)
  • Tiered (Good/Better/Best)

Databricks Cost Structure:

  • How do Databricks costs factor into your pricing? [Pass-through, margin, bundled]
  • What visibility do customers have into DBU consumption? [Full transparency, summary only, none]

1.3 Regulatory & Compliance Requirements

Data Residency:

  • Data must remain in customer's region
  • Data must remain in customer's cloud account
  • Data can be centralized in partner environment
  • Hybrid (some data centralized, some in customer environment)

Compliance Frameworks:

  • GDPR
  • HIPAA
  • SOC 2
  • FedRAMP / GovCloud
  • Other: [Specify]

Data Classification:

  • PII (Personally Identifiable Information)
  • PHI (Protected Health Information)
  • PCI (Payment Card Data)
  • Confidential Business Data
  • Public/Non-sensitive

2. Deployment Model

2.1 Selected Deployment Model

  • Partner Hosted SaaS — You own and operate Databricks; customers access as a service
  • Hybrid — You manage control plane; data plane in customer's cloud account
  • Side Car — Federated pattern using Delta Sharing
  • Customer Managed — Customer owns Databricks infrastructure
  • Multiple Models — Different models for different customer segments

If Multiple Models:

Customer SegmentDeployment ModelJustification
[Segment 1][Model][Why this model?]
[Segment 2][Model][Why this model?]

2.2 Workspace Tenancy Strategy

For SaaS deployments, how will you handle multi-tenancy?

  • Workspace per Tenant — Each customer gets dedicated workspace
  • Multi-Tenant Workspace — Multiple customers share workspace(s)
  • Hybrid Workspace Tenancy — Mix of dedicated and shared based on customer tier/requirements

Justification:
[Explain your tenancy choice. Consider: isolation requirements, operational overhead, scale limits, cost]

Scale Projections:

Timeframe# Customers# WorkspacesAvg Workspaces/Customer
6 months[#][#][#]
12 months[#][#][#]
24 months[#][#][#]

2.3 Multi-Cloud Strategy

Primary Cloud:

  • AWS
  • Azure
  • GCP

Multi-Cloud Plans:

  • Single cloud only
  • Multi-cloud (specify secondary clouds): [AWS, Azure, GCP]

GovCloud/Sovereign Cloud:

  • Yes, required for [Government, Defense, specific countries]
  • Not required
  • Roadmapped

3. Technical Architecture

3.1 Reference Architecture Diagram

REQUIRED: Attach or embed a detailed architecture diagram showing:

  • Data sources and ingestion paths
  • Databricks components (compute, storage, governance)
  • External services and integrations
  • End-user interfaces
  • Network boundaries and security zones

[Insert architecture diagram here]

Diagram Tool: [e.g., Lucidchart, Draw.io, Mermaid]
Link to Editable Version: [URL to source file]

3.2 Data Flow

Data Sources:

Source TypeDescriptionVolumeFrequencyIngestion Method
[e.g., Salesforce][CRM data][100GB/day][Hourly][REST API, Lakeflow Connect]
[e.g., S3 bucket][Clickstream logs][500GB/day][Streaming][Auto Loader]
[...][...][...][...][...]

Data Transformation Pipeline:

[Raw/Bronze] → [Cleansed/Silver] → [Curated/Gold] → [Analytics/Serving]

Describe each stage:

  1. Bronze Layer: [Raw data ingestion, minimal transformations]
  2. Silver Layer: [Data quality, deduplication, normalization]
  3. Gold Layer: [Business logic, aggregations, denormalization]
  4. Serving Layer: [How data is exposed to end users/applications]

3.3 Databricks Product Usage

For each Databricks product area, indicate your usage plan:

Unity Catalog

  • Using[Describe how: governance, lineage, access control]
  • Roadmapped[Timeline]
  • Not Using[Why not?]

Unity Catalog Design Pattern:

  • Catalog per customer
  • Schema per tenant
  • Multi-tenant tables with RLS
  • Hub-and-spoke (shared + customer catalogs)
  • Other: [Describe]

Lakeflow (Data Engineering)

  • Using — Components:
    • Lakeflow Connect (ingestion connectors)
    • Auto Loader (cloud storage ingestion)
    • Delta Live Tables (declarative pipelines)
    • Lakeflow Jobs (orchestration)
    • Other: [Specify]
  • Roadmapped[Timeline]
  • Not Using[Why not? What alternative?]

DBSQL (Data Warehousing)

  • Using[SQL warehouses for: analytics, serving, BI integration]
    • Warehouse Type: [ ] Classic [ ] Serverless
    • Estimated # of Warehouses: [#]
    • Warehouse Sizing: [Small, Medium, Large, X-Large]
  • Roadmapped[Timeline]
  • Not Using[Why not? What alternative?]

Lakebase (OLTP Database)

  • Using[Use case: transactional data, app state, caching]
  • Roadmapped[Timeline]
  • Not Using[Why not? What alternative?]

If NOT using Lakebase, what external OLTP database?

  • PostgreSQL
  • MySQL
  • SingleStore
  • SQL Server
  • Other: [Specify]

AI/BI & Genie

  • Using — Components:
    • AI/BI Dashboards (low-code visualization)
    • Genie (conversational analytics)
    • Databricks One (simplified consumption UI)
  • Roadmapped[Timeline]
  • Not Using[Why not? What alternative?]

If NOT using AI/BI Dashboards, what BI tool?

  • Power BI
  • Tableau
  • Looker
  • Custom/Bespoke
  • Other: [Specify]

Databricks Apps

  • Using[Streamlit, Dash, Gradio apps hosted on Databricks]
  • Roadmapped[Timeline]
  • Not Using[Apps run externally]

If apps run externally, where?

  • Kubernetes
  • Cloud VMs (EC2, Azure VM)
  • Serverless (Lambda, Cloud Functions, Azure Functions)
  • Container services (ECS, Cloud Run, Azure Container Apps)
  • Other: [Specify]

Agent Bricks (GenAI/ML)

  • Using — Components:
    • Foundation Model APIs (LLM serving)
    • Model Serving (custom models)
    • Vector Search (embeddings, RAG)
    • Agent Framework (multi-agent workflows)
    • AI Gateway (centralized model access)
    • MLflow (ML lifecycle management)
    • Feature Store
  • Roadmapped[Timeline]
  • Not Using[Why not? What alternative?]

If NOT using Databricks Agent Bricks, what AI/ML stack?

  • LLM Provider: [ ] OpenAI [ ] Anthropic [ ] Google Gemini [ ] Other: [Specify]
  • Vector DB: [ ] Pinecone [ ] Weaviate [ ] OpenSearch [ ] Other: [Specify]
  • ML Platform: [ ] SageMaker [ ] Vertex AI [ ] Azure ML [ ] Other: [Specify]

Delta Sharing

  • Using[Describe sharing patterns: internal, external, Marketplace]
  • Roadmapped[Timeline]
  • Not Using[Why not?]

ZeroBus (Streaming Ingestion)

  • Using[Push-based ingestion via gRPC/REST]
  • Roadmapped[Timeline]
  • Not Using[Why not? What alternative?]

3.4 External Dependencies

Critical External Services:

ServicePurposeWhy External?Migration Plan to Databricks?
[e.g., PostgreSQL][OLTP database][Legacy system][Migrate to Lakebase in Q2]
[e.g., Airflow][Orchestration][Existing investment][Evaluate Lakeflow Jobs]
[e.g., OpenAI][LLM inference][No in-house LLM][Test Databricks FM APIs]
[...][...][...][...]

3.5 Compute Strategy

Compute Types:

  • Classic Compute (customer VPC)
  • Serverless Compute (fully managed)
  • Hybrid (mix of classic and serverless)

Compute Policies:

  • Yes, cluster policies will enforce: [instance types, auto-termination, autoscaling, tags]
  • No, users create ad-hoc clusters

Autoscaling Strategy:

  • Min Workers: [#]
  • Max Workers: [#]
  • Auto-termination: [minutes]

Spot/Preemptible Instances:

  • Yes, for non-critical workloads
  • No, on-demand only

4. Data Governance & Security

4.1 Unity Catalog Implementation

Unity Catalog Adoption:

  • 100% Unity Catalog — All data assets governed by UC
  • Partial[% of data in UC, what's outside and why?]
  • Not Yet[Migration plan and timeline]

Catalog Structure:

metastore/
├── [catalog_name]/
│ ├── [schema_1]/
│ │ ├── [table_1]
│ │ └── [table_2]
│ └── [schema_2]/
└── [catalog_name_2]/
└── ...

[Document your planned catalog/schema hierarchy]

Workspace Bindings:

  • How are catalogs bound to workspaces? [Describe strategy]

4.2 Access Control

Identity & Access Management:

End-User Authentication:

  • Users log into Databricks directly (OAuth U2M)
  • Users log into your app (app authenticates to Databricks on their behalf)
    • Method: [ ] SSO-mapped OAuth [ ] Service Principal per user [ ] Shared Service Principal

Backend API Authentication:

  • OAuth M2M (Service Principals)RECOMMENDED
  • Personal Access Tokens (PATs) ⚠️ NOT RECOMMENDED
  • Session tokens (OAuth token refresh)

RBAC Strategy:

RolePermissionsImplementation
[e.g., Customer Admin][Full access to customer catalog][Unity Catalog grants]
[e.g., Customer Analyst][Read-only to gold layer][Unity Catalog grants + views]
[e.g., Platform Admin][All catalogs, system admin][Account admin role]

4.3 Data Security

Encryption:

  • At Rest: [Databricks managed keys, Customer managed keys (CMK)]
  • In Transit: [TLS 1.2+]

Network Security:

  • Public internet access (with IP allowlists)
  • Private connectivity (AWS PrivateLink, Azure Private Link, GCP Private Service Connect)
  • VPN tunnel
  • Other: [Specify]

Data Masking:

  • Column-level masking for PII
  • Row-level security (RLS) for multi-tenancy
  • Dynamic views

Audit Logging:

  • Unity Catalog audit logs
  • System tables monitoring
  • External SIEM integration: [e.g., Splunk, Datadog]

5. Cost Management & Telemetry

5.1 Tagging Strategy

Tagging Implementation:

  • Fully Implemented — All resources tagged from day 1
  • Partial[What's tagged, what's not?]
  • Not Yet[Implementation timeline]

Required Tags:

Tag KeyPurposeExample ValuesEnforcement
customer_idBilling attributionacme_corp, cust_12345[ ] Required [ ] Optional
environmentInternal vs. productionproduction, internal, staging[ ] Required [ ] Optional
serviceFeature/componentetl_pipeline, ml_inference, api[ ] Required [ ] Optional
cost_centerInternal allocationplatform_ops, data_eng[ ] Required [ ] Optional
[Custom tag][Purpose][Values][ ] Required [ ] Optional

Tag Enforcement Mechanisms:

  • Cluster policies (compute tagging)
  • Serverless budget policies (serverless workload tagging)
  • SQL warehouse tags
  • Manual process (not recommended)

5.2 Cost Allocation

Customer Cost Attribution:

  • Fully Automated — System tables + tags provide per-customer costs
  • Semi-Automated[What's manual?]
  • Manual[How do you track costs?]

Can you differentiate:

  • Product usage (customer-facing workloads) vs. Internal usage (platform operations)?
  • Per-customer usage in shared workspaces?
  • Per-feature usage (which product features drive costs)?

5.3 Budget Monitoring

Budget Alerts:

  • Per-customer budgets (using customer_id tag)
  • Per-workspace budgets
  • Account-level budget
  • Not yet implemented

Notification Recipients:

  • Technical team
  • Finance team
  • Customer success team
  • Automated systems (PagerDuty, Slack)

5.4 Telemetry & Observability

For Customer-Managed Deployments: How will you collect telemetry?

  • Databricks APIs (system tables, usage APIs)
  • Agent/SDK embedded in customer workspace
  • Customer reports usage via UI
  • No telemetry collected

Observability Stack:

LayerToolPurpose
Metrics[e.g., Datadog, Prometheus][Resource utilization, query performance]
Logs[e.g., Splunk, ELK][Application logs, error tracking]
Traces[e.g., OpenTelemetry, Jaeger][Distributed tracing across services]
Databricks-Native[System tables, Query History][DBU consumption, query performance]

6. Integration Requirements

6.1 Data Sources

Ingestion Connectors:

Source SystemConnector TypeVolumeFrequencyNotes
[e.g., Salesforce][Lakeflow Connect][10GB/day][Hourly][CRM data sync]
[e.g., AWS S3][Auto Loader][500GB/day][Continuous][Event logs]
[e.g., PostgreSQL][JDBC][50GB/day][Daily batch][Transactional data]
[...][...][...][...][...]

6.2 Downstream Integrations

Data Egress:

DestinationProtocolPurposeVolume
[e.g., Snowflake][Delta Sharing][Share curated data][100GB/month]
[e.g., Customer's S3][Direct write][Export reports][10GB/day]
[...][...][...][...]

6.3 APIs & SDKs

Databricks SDK Usage:

  • Python SDK
  • Java SDK
  • REST API (direct)
  • Terraform Provider
  • DABs (Databricks Asset Bundles)

Custom APIs:

  • Do you expose APIs for customers to integrate with your product?
    • Yes — [REST, GraphQL, gRPC]
    • No — [UI-only access]

7. Operations & Automation

7.1 Infrastructure as Code

IaC Tool:

  • Terraform
  • Pulumi
  • CloudFormation / ARM Templates
  • Databricks Asset Bundles (DABs)
  • Manual configuration (not recommended)

What is automated?

  • Workspace provisioning
  • Cluster/warehouse creation
  • Unity Catalog setup (catalogs, schemas, grants)
  • Job deployment
  • Cluster policy creation
  • Secret management
  • Network configuration

7.2 CI/CD Pipeline

Deployment Pipeline:

[Code Commit] → [Build/Test] → [Staging Deploy] → [Prod Deploy]

CI/CD Tool:

  • GitHub Actions
  • GitLab CI
  • Jenkins
  • Azure DevOps
  • CircleCI
  • Other: [Specify]

Deployment Cadence:

  • Code deployments: [Daily, Weekly, Bi-weekly]
  • Infrastructure changes: [As needed, Monthly, Quarterly]

7.3 Customer Onboarding

Onboarding Process:

  1. Customer Provisioning: [Describe steps: workspace creation, catalog setup, user accounts]
  2. Data Onboarding: [How is customer data ingested initially?]
  3. Access Setup: [User roles, SSO integration, API keys]
  4. Validation: [How do you verify the customer is ready?]

Onboarding Time:

  • Manual onboarding time: [hours/days per customer]
  • Automated onboarding time: [hours/days per customer]
  • Onboarding automation level: [0-100%]

7.4 Monitoring & Alerting

Key Metrics Monitored:

  • Job success/failure rates
  • Query performance (p50, p95, p99)
  • Data freshness (SLA: [minutes/hours])
  • DBU consumption (per customer, per service)
  • Error rates
  • API latency
  • Other: [Specify]

Alerting Channels:

  • PagerDuty
  • Slack
  • Email
  • Other: [Specify]

On-Call Coverage:

  • 24/7 on-call
  • Business hours only
  • Best-effort

8. Scale & Performance

8.1 Scale Targets

Current Scale:

  • Customers: [#]
  • Data Volume: [TB]
  • Daily Queries: [#]
  • Daily Jobs: [#]

6-Month Targets:

  • Customers: [#] (% growth)
  • Data Volume: [TB] (% growth)
  • Daily Queries: [#] (% growth)
  • Daily Jobs: [#] (% growth)

12-Month Targets:

  • Customers: [#] (% growth)
  • Data Volume: [TB] (% growth)
  • Daily Queries: [#] (% growth)
  • Daily Jobs: [#] (% growth)

8.2 Performance SLAs

Customer-Facing SLAs:

MetricSLA TargetMeasurement
Query latency (p95)[< 5 seconds][DBSQL query history]
Data freshness[< 30 minutes][DLT pipeline monitoring]
API availability[99.9%][Uptime monitoring]
Job success rate[99.5%][Job run history]

8.3 Capacity Planning

Resource Limits:

  • Unity Catalog: [10,000 schemas per catalog, 10,000 tables per schema]
  • Workspaces: [How many workspaces at scale?]
  • Concurrent queries: [Peak concurrency estimate]

Scaling Strategy:

  • Vertical scaling (larger compute)
  • Horizontal scaling (more compute instances)
  • Serverless (auto-scale)
  • Hybrid

9. Risk Assessment

9.1 Technical Risks

RiskImpactLikelihoodMitigation
[e.g., Unity Catalog adoption < 100%][High][Medium][Prioritize UC migration in Phase 1]
[e.g., External orchestration tool][Medium][High][Evaluate Lakeflow Jobs as alternative]
[e.g., Manual tagging process][High][High][Automate with cluster policies]
[...][...][...][...]

9.2 Operational Risks

RiskImpactLikelihoodMitigation
[e.g., Insufficient on-call coverage][High][Medium][Hire DevOps engineer, SLAs]
[e.g., Manual customer onboarding][Medium][High][Build automation, Terraform]
[...][...][...][...]

9.3 Security Risks

RiskImpactLikelihoodMitigation
[e.g., PATs instead of OAuth M2M][High][High][Migrate to Service Principals]
[e.g., Insufficient RBAC][Medium][Medium][Implement UC grants, auditing]
[...][...][...][...]

9.4 Cost Risks

RiskImpactLikelihoodMitigation
[e.g., Untagged resources][High][High][Tag enforcement policies]
[e.g., Runaway compute costs][High][Medium][Auto-termination, budget alerts]
[...][...][...][...]

10. Implementation Plan

10.1 Phases

Phase 1: Foundation (Months 1-2)

  • Workspace provisioning automation
  • Unity Catalog setup
  • Tagging strategy implementation
  • Authentication (OAuth M2M)
  • Basic monitoring & alerting

Phase 2: Core Product (Months 3-4)

  • Data ingestion pipelines
  • Transformation layer (Bronze → Silver → Gold)
  • SQL warehouses for analytics
  • Initial BI/reporting capabilities
  • Customer onboarding automation

Phase 3: Advanced Features (Months 5-6)

  • AI/ML workloads (if applicable)
  • Advanced analytics
  • Databricks Apps (if applicable)
  • Delta Sharing (if applicable)
  • Optimization & tuning

Phase 4: GA Launch (Month 6+)

  • Full production readiness
  • Customer pilots
  • General Availability

10.2 Milestones

MilestoneTarget DateSuccess Criteria
Architecture Approved[YYYY-MM-DD][Databricks Built-On team sign-off]
Unity Catalog Live[YYYY-MM-DD][100% of tables governed by UC]
First Customer Onboarded[YYYY-MM-DD][1 customer fully operational]
10 Customers Live[YYYY-MM-DD][10 customers in production]
General Availability[YYYY-MM-DD][Public launch, all features complete]

10.3 Resource Requirements

Team Structure:

RoleCountKey Responsibilities
Platform Engineer[#][Infrastructure, IaC, automation]
Data Engineer[#][Pipelines, data quality, transformations]
Backend Engineer[#][APIs, authentication, integrations]
ML Engineer[#][Model training, serving, monitoring]
DevOps Engineer[#][CI/CD, monitoring, on-call]
Product Manager[#][Requirements, roadmap, customers]

External Support Needed:

  • Databricks Professional Services
  • Databricks Solutions Architect (ongoing)
  • Third-party consulting: [Specify firm/focus]

11. Success Metrics

11.1 Technical Success Metrics

MetricTargetMeasurement Method
Unity Catalog coverage100%[System tables query]
Tagged resources100%[Compliance audit job]
Job success rate99.5%[Job run history]
Query p95 latency< 5 seconds[DBSQL query history]
Customer onboarding time< 2 hours[Manual tracking → automated]

11.2 Business Success Metrics

MetricTargetMeasurement Method
Customers at GA[#][CRM]
Monthly DBU consumption[#][System billing tables]
Customer DBU cost visibility100%[Tagging audit]
Gross margin per customer> [%][Finance analysis]

11.3 Quarterly Review Cadence

Review Schedule:

  • Monthly architecture sync (during build phase)
  • Quarterly business review (post-GA)
  • Annual strategic planning

Review Participants:

  • Partner: [Technical Lead, Product Manager, Engineering Manager]
  • Databricks: [Partner Development Manager, Solutions Architect]

12. Appendices

Appendix A: Glossary

TermDefinition
DBUDatabricks Unit — unit of compute consumption
PATPersonal Access Token (deprecated for production)
OAuth M2MMachine-to-Machine authentication using Service Principals
UCUnity Catalog
DLTDelta Live Tables
RLSRow-Level Security

Appendix C: Assumptions

[Document any assumptions made in this SOW]

  1. [e.g., Customer data is less than 10TB per tenant]
  2. [e.g., All customers are in AWS US regions]
  3. [e.g., No FedRAMP requirements in first 12 months]

Appendix D: Out of Scope

[Document what is explicitly out of scope for this implementation]

  1. [e.g., Real-time streaming (< 1 minute latency)]
  2. [e.g., Multi-region disaster recovery]
  3. [e.g., White-label embedding of Databricks UI]

Appendix E: Open Questions

[Document unresolved questions to be addressed during review]

  1. [e.g., Should we use serverless or classic compute for customer workloads?]
  2. [e.g., What's the recommended cluster policy for a SaaS multi-tenant scenario?]
  3. [...]

Document Approval

Partner Technical Lead:
Name: [Name]
Signature: ________________
Date: [YYYY-MM-DD]

Databricks Built-On Team:
Name: [Partner Development Manager]
Signature: ________________
Date: [YYYY-MM-DD]

Databricks Solutions Architect:
Name: [SA Name]
Signature: ________________
Date: [YYYY-MM-DD]


Version History

VersionDateAuthorChanges
0.1[YYYY-MM-DD][Name]Initial draft
1.0[YYYY-MM-DD][Name]First review submission
1.1[YYYY-MM-DD][Name]Addressed Databricks feedback
2.0[YYYY-MM-DD][Name]Final approved version