Built-On Architecture Requirements Checklist

Purpose: This checklist provides a quick reference for the minimum technical requirements and best practices for Built-On Databricks architectures. Partners must complete the full SOW Template before implementation begins.

When to Use: Use this checklist during initial partner discovery conversations to quickly assess architecture maturity and identify gaps.

✅ Critical Requirements (Must Have)

These are mandatory for all Built-On partners:

1. Unity Catalog

100% Unity Catalog adoption for all production data assets
Clear catalog/schema hierarchy documented
Workspace bindings configured (if multi-workspace)
Data lineage enabled
Access control grants defined

❌ Common Anti-Pattern: Using legacy Hive metastore or external metastores (e.g., AWS Glue)

2. Tagging & Cost Attribution

customer_id tag implemented on all compute resources
Tag enforcement via cluster policies and serverless budget policies
Can differentiate product usage from internal usage
System tables queries configured for cost analysis
Budget alerts configured per customer (or per workspace)

❌ Common Anti-Pattern: Manual tagging or "we'll add tags later"

3. Authentication

OAuth M2M (Service Principals) for backend API authentication ✅
NOT using Personal Access Tokens (PATs) ⚠️
SSO-mapped OAuth for end-user authentication (if applicable)
Secrets stored in Databricks Secrets or cloud provider secret manager

❌ Common Anti-Pattern: Long-lived PATs hardcoded in code or config files

4. Infrastructure as Code (IaC)

All infrastructure defined as code (Terraform, Pulumi, DABs)
Workspace provisioning automated
Cluster/warehouse templates defined
Unity Catalog setup automated
CI/CD pipeline for deployments

❌ Common Anti-Pattern: Manual workspace configuration via UI (doesn't scale)

5. Deployment Model Clarity

Deployment model selected and documented: Partner Hosted, Hybrid, Side Car, or Customer Managed
Tenancy model selected (if SaaS): Workspace per tenant, Multi-tenant workspace, or Hybrid
Justification documented for model choice
Scale projections documented (# customers, # workspaces, DBU consumption)

❌ Common Anti-Pattern: Unclear tenancy model or mixing models without strategy

⚠️ Strongly Recommended (Should Have)

These are best practices that significantly improve operational efficiency and customer experience:

6. Databricks-Native Orchestration

Using Lakeflow Jobs for orchestration (not external Airflow, ADF, etc.)
Delta Live Tables for declarative pipelines
Auto Loader for cloud storage ingestion

⚠️ Gap: Using external orchestration (Airflow, ADF) increases operational complexity and costs

7. Databricks-Native Analytics

Using DBSQL for data warehousing (not external Snowflake, Redshift)
Using AI/BI Dashboards for reporting (not external Power BI, Tableau)
Serverless SQL warehouses for variable workloads

⚠️ Gap: External BI tools add cost and complexity; limited Databricks telemetry

8. Lakebase for OLTP

Using Lakebase for transactional data (not external PostgreSQL, MySQL, SingleStore)
Consolidated OLTP and analytical workloads on Databricks

⚠️ Gap: External OLTP databases split the stack and add operational overhead

9. Agent Bricks for AI/ML

Using Databricks FM APIs for LLM serving (not external OpenAI, Anthropic)
Using Vector Search for embeddings/RAG (not external Pinecone, Weaviate)
Using Model Serving for custom ML models
Using MLflow for ML lifecycle management

⚠️ Gap: External AI providers limit telemetry, governance, and cost visibility

10. Monitoring & Observability

System tables queries for DBU consumption tracking
Job/query monitoring with alerts
Data quality monitoring (DLT expectations or Lakehouse Monitoring)
Customer-specific dashboards for usage visibility

⚠️ Gap: No proactive monitoring leads to cost surprises and SLA breaches

📊 Architecture Maturity Assessment

Use this rubric to score partner architecture readiness:

Category	Score	Criteria
Unity Catalog	0-5	0 = No UC; 3 = Partial UC; 5 = 100% UC with full governance
Tagging	0-5	0 = No tags; 3 = Manual tags; 5 = Automated tag enforcement
Authentication	0-5	0 = PATs; 3 = Mix of PATs + OAuth; 5 = Full OAuth M2M
IaC	0-5	0 = Manual; 3 = Partial IaC; 5 = Full automation with CI/CD
Databricks-Native Stack	0-5	0 = Heavy external tools; 3 = Mix; 5 = 100% Databricks
Monitoring	0-5	0 = None; 3 = Basic alerts; 5 = Comprehensive observability

Total Score: ____ / 30

Maturity Level:

0-10: ❌ Not Ready — Significant architecture gaps, high risk
11-20: ⚠️ Needs Improvement — Some gaps, requires remediation plan
21-25: ✅ Good — Minor gaps, acceptable for GA
26-30: ✅✅ Excellent — Best-in-class architecture

🚨 Red Flags (Must Address Before Launch)

These are blocking issues that must be resolved:

❌ No Unity Catalog — All data must be governed by Unity Catalog
❌ No tagging — Customer cost attribution is impossible
❌ Using PATs — Security risk, not production-ready
❌ Manual customer onboarding — Won't scale past 10 customers
❌ No IaC — Infrastructure drift, repeatability issues
❌ No monitoring/alerting — Will miss SLA breaches and cost overruns

📋 Pre-Review Checklist

Before submitting your SOW for Databricks review, ensure:

Full SOW Template completed
Architecture diagram created (Lucidchart, Draw.io, Mermaid)
All "Critical Requirements" (1-5) are met
Red flags addressed or remediation plan documented
Open questions clearly documented in SOW
Team has reviewed internally

📞 Next Steps

Complete Full SOW: Use the SOW Template
Schedule Architecture Review: Contact your Partner Development Manager
Prepare for Joint Review: Bring architecture diagram, technical lead, product manager
Address Feedback: Iterate on SOW based on Databricks feedback
Get Approval: Final sign-off before implementation begins

📚 Reference Documentation

SOW Template (Full) — Complete architecture design document
Cost Management Guide — Tagging and attribution
Unity Catalog Governance — Catalog design patterns
Deployment Models — Model selection guidance
Partner Well-Architected Framework — Complete PWAF guidelines

🎯 Quick Decision Trees

Deployment Model Selection

Does customer data need to stay in their cloud account?
├─ YES → Hybrid, Side Car, or Customer Managed
│   └─ Do they have a Databricks platform team?
│       ├─ YES → Side Car or Customer Managed
│       └─ NO → Hybrid
└─ NO → Partner Hosted SaaS
    └─ How many customers?
        ├─ < 50 → Workspace per tenant (simple)
        ├─ 50-500 → Multi-tenant workspace (efficient)
        └─ > 500 → Multi-tenant workspace + schema per tenant

Unity Catalog Pattern Selection

How many customers per workspace?
├─ 1 customer (dedicated workspace) → Catalog per customer (Hub & Spoke)
└─ Multiple customers (shared workspace)
    └─ Do customers need custom schema/tables?
        ├─ YES → Schema per tenant
        └─ NO → Multi-tenant tables with RLS

Authentication Pattern Selection

How do end users access the product?
├─ Directly in Databricks → OAuth U2M (user login)
└─ Via your application
    └─ Does your app act on behalf of users?
        ├─ YES → SSO-mapped OAuth (app gets tokens for users)
        └─ NO → OAuth M2M (Service Principal for app backend)

Version: 1.0
Last Updated: 2026-04-07
Maintained By: Databricks Built-On Team

✅ Critical Requirements (Must Have)​

1. Unity Catalog​

2. Tagging & Cost Attribution​

3. Authentication​

4. Infrastructure as Code (IaC)​

5. Deployment Model Clarity​

⚠️ Strongly Recommended (Should Have)​

6. Databricks-Native Orchestration​

7. Databricks-Native Analytics​

8. Lakebase for OLTP​

9. Agent Bricks for AI/ML​

10. Monitoring & Observability​

📊 Architecture Maturity Assessment​

🚨 Red Flags (Must Address Before Launch)​

📋 Pre-Review Checklist​

📞 Next Steps​

📚 Reference Documentation​

🎯 Quick Decision Trees​

Deployment Model Selection​

Unity Catalog Pattern Selection​

Authentication Pattern Selection​