Skip to main content

Built-On Architecture Requirements Checklist

Purpose: This checklist provides a quick reference for the minimum technical requirements and best practices for Built-On Databricks architectures. Partners must complete the full SOW Template before implementation begins.

When to Use: Use this checklist during initial partner discovery conversations to quickly assess architecture maturity and identify gaps.


✅ Critical Requirements (Must Have)

These are mandatory for all Built-On partners:

1. Unity Catalog

  • 100% Unity Catalog adoption for all production data assets
  • Clear catalog/schema hierarchy documented
  • Workspace bindings configured (if multi-workspace)
  • Data lineage enabled
  • Access control grants defined

❌ Common Anti-Pattern: Using legacy Hive metastore or external metastores (e.g., AWS Glue)


2. Tagging & Cost Attribution

  • customer_id tag implemented on all compute resources
  • Tag enforcement via cluster policies and serverless budget policies
  • Can differentiate product usage from internal usage
  • System tables queries configured for cost analysis
  • Budget alerts configured per customer (or per workspace)

❌ Common Anti-Pattern: Manual tagging or "we'll add tags later"


3. Authentication

  • OAuth M2M (Service Principals) for backend API authentication ✅
  • NOT using Personal Access Tokens (PATs) ⚠️
  • SSO-mapped OAuth for end-user authentication (if applicable)
  • Secrets stored in Databricks Secrets or cloud provider secret manager

❌ Common Anti-Pattern: Long-lived PATs hardcoded in code or config files


4. Infrastructure as Code (IaC)

  • All infrastructure defined as code (Terraform, Pulumi, DABs)
  • Workspace provisioning automated
  • Cluster/warehouse templates defined
  • Unity Catalog setup automated
  • CI/CD pipeline for deployments

❌ Common Anti-Pattern: Manual workspace configuration via UI (doesn't scale)


5. Deployment Model Clarity

  • Deployment model selected and documented: Partner Hosted, Hybrid, Side Car, or Customer Managed
  • Tenancy model selected (if SaaS): Workspace per tenant, Multi-tenant workspace, or Hybrid
  • Justification documented for model choice
  • Scale projections documented (# customers, # workspaces, DBU consumption)

❌ Common Anti-Pattern: Unclear tenancy model or mixing models without strategy


These are best practices that significantly improve operational efficiency and customer experience:

6. Databricks-Native Orchestration

  • Using Lakeflow Jobs for orchestration (not external Airflow, ADF, etc.)
  • Delta Live Tables for declarative pipelines
  • Auto Loader for cloud storage ingestion

⚠️ Gap: Using external orchestration (Airflow, ADF) increases operational complexity and costs


7. Databricks-Native Analytics

  • Using DBSQL for data warehousing (not external Snowflake, Redshift)
  • Using AI/BI Dashboards for reporting (not external Power BI, Tableau)
  • Serverless SQL warehouses for variable workloads

⚠️ Gap: External BI tools add cost and complexity; limited Databricks telemetry


8. Lakebase for OLTP

  • Using Lakebase for transactional data (not external PostgreSQL, MySQL, SingleStore)
  • Consolidated OLTP and analytical workloads on Databricks

⚠️ Gap: External OLTP databases split the stack and add operational overhead


9. Agent Bricks for AI/ML

  • Using Databricks FM APIs for LLM serving (not external OpenAI, Anthropic)
  • Using Vector Search for embeddings/RAG (not external Pinecone, Weaviate)
  • Using Model Serving for custom ML models
  • Using MLflow for ML lifecycle management

⚠️ Gap: External AI providers limit telemetry, governance, and cost visibility


10. Monitoring & Observability

  • System tables queries for DBU consumption tracking
  • Job/query monitoring with alerts
  • Data quality monitoring (DLT expectations or Lakehouse Monitoring)
  • Customer-specific dashboards for usage visibility

⚠️ Gap: No proactive monitoring leads to cost surprises and SLA breaches


📊 Architecture Maturity Assessment

Use this rubric to score partner architecture readiness:

CategoryScoreCriteria
Unity Catalog0-50 = No UC; 3 = Partial UC; 5 = 100% UC with full governance
Tagging0-50 = No tags; 3 = Manual tags; 5 = Automated tag enforcement
Authentication0-50 = PATs; 3 = Mix of PATs + OAuth; 5 = Full OAuth M2M
IaC0-50 = Manual; 3 = Partial IaC; 5 = Full automation with CI/CD
Databricks-Native Stack0-50 = Heavy external tools; 3 = Mix; 5 = 100% Databricks
Monitoring0-50 = None; 3 = Basic alerts; 5 = Comprehensive observability

Total Score: ____ / 30

Maturity Level:

  • 0-10:Not Ready — Significant architecture gaps, high risk
  • 11-20: ⚠️ Needs Improvement — Some gaps, requires remediation plan
  • 21-25:Good — Minor gaps, acceptable for GA
  • 26-30: ✅✅ Excellent — Best-in-class architecture

🚨 Red Flags (Must Address Before Launch)

These are blocking issues that must be resolved:

  1. ❌ No Unity Catalog — All data must be governed by Unity Catalog
  2. ❌ No tagging — Customer cost attribution is impossible
  3. ❌ Using PATs — Security risk, not production-ready
  4. ❌ Manual customer onboarding — Won't scale past 10 customers
  5. ❌ No IaC — Infrastructure drift, repeatability issues
  6. ❌ No monitoring/alerting — Will miss SLA breaches and cost overruns

📋 Pre-Review Checklist

Before submitting your SOW for Databricks review, ensure:

  • Full SOW Template completed
  • Architecture diagram created (Lucidchart, Draw.io, Mermaid)
  • All "Critical Requirements" (1-5) are met
  • Red flags addressed or remediation plan documented
  • Open questions clearly documented in SOW
  • Team has reviewed internally

📞 Next Steps

  1. Complete Full SOW: Use the SOW Template
  2. Schedule Architecture Review: Contact your Partner Development Manager
  3. Prepare for Joint Review: Bring architecture diagram, technical lead, product manager
  4. Address Feedback: Iterate on SOW based on Databricks feedback
  5. Get Approval: Final sign-off before implementation begins

📚 Reference Documentation


🎯 Quick Decision Trees

Deployment Model Selection

Does customer data need to stay in their cloud account?
├─ YES → Hybrid, Side Car, or Customer Managed
│ └─ Do they have a Databricks platform team?
│ ├─ YES → Side Car or Customer Managed
│ └─ NO → Hybrid
└─ NO → Partner Hosted SaaS
└─ How many customers?
├─ < 50 → Workspace per tenant (simple)
├─ 50-500 → Multi-tenant workspace (efficient)
└─ > 500 → Multi-tenant workspace + schema per tenant

Unity Catalog Pattern Selection

How many customers per workspace?
├─ 1 customer (dedicated workspace) → Catalog per customer (Hub & Spoke)
└─ Multiple customers (shared workspace)
└─ Do customers need custom schema/tables?
├─ YES → Schema per tenant
└─ NO → Multi-tenant tables with RLS

Authentication Pattern Selection

How do end users access the product?
├─ Directly in Databricks → OAuth U2M (user login)
└─ Via your application
└─ Does your app act on behalf of users?
├─ YES → SSO-mapped OAuth (app gets tokens for users)
└─ NO → OAuth M2M (Service Principal for app backend)

Version: 1.0
Last Updated: 2026-04-07
Maintained By: Databricks Built-On Team