Built-On Architecture Requirements Checklist
Purpose: This checklist provides a quick reference for the minimum technical requirements and best practices for Built-On Databricks architectures. Partners must complete the full SOW Template before implementation begins.
When to Use: Use this checklist during initial partner discovery conversations to quickly assess architecture maturity and identify gaps.
✅ Critical Requirements (Must Have)
These are mandatory for all Built-On partners:
1. Unity Catalog
- 100% Unity Catalog adoption for all production data assets
- Clear catalog/schema hierarchy documented
- Workspace bindings configured (if multi-workspace)
- Data lineage enabled
- Access control grants defined
❌ Common Anti-Pattern: Using legacy Hive metastore or external metastores (e.g., AWS Glue)
2. Tagging & Cost Attribution
-
customer_idtag implemented on all compute resources - Tag enforcement via cluster policies and serverless budget policies
- Can differentiate product usage from internal usage
- System tables queries configured for cost analysis
- Budget alerts configured per customer (or per workspace)
❌ Common Anti-Pattern: Manual tagging or "we'll add tags later"
3. Authentication
- OAuth M2M (Service Principals) for backend API authentication ✅
- NOT using Personal Access Tokens (PATs) ⚠️
- SSO-mapped OAuth for end-user authentication (if applicable)
- Secrets stored in Databricks Secrets or cloud provider secret manager
❌ Common Anti-Pattern: Long-lived PATs hardcoded in code or config files
4. Infrastructure as Code (IaC)
- All infrastructure defined as code (Terraform, Pulumi, DABs)
- Workspace provisioning automated
- Cluster/warehouse templates defined
- Unity Catalog setup automated
- CI/CD pipeline for deployments
❌ Common Anti-Pattern: Manual workspace configuration via UI (doesn't scale)
5. Deployment Model Clarity
- Deployment model selected and documented: Partner Hosted, Hybrid, Side Car, or Customer Managed
- Tenancy model selected (if SaaS): Workspace per tenant, Multi-tenant workspace, or Hybrid
- Justification documented for model choice
- Scale projections documented (# customers, # workspaces, DBU consumption)
❌ Common Anti-Pattern: Unclear tenancy model or mixing models without strategy
⚠️ Strongly Recommended (Should Have)
These are best practices that significantly improve operational efficiency and customer experience:
6. Databricks-Native Orchestration
- Using Lakeflow Jobs for orchestration (not external Airflow, ADF, etc.)
- Delta Live Tables for declarative pipelines
- Auto Loader for cloud storage ingestion
⚠️ Gap: Using external orchestration (Airflow, ADF) increases operational complexity and costs
7. Databricks-Native Analytics
- Using DBSQL for data warehousing (not external Snowflake, Redshift)
- Using AI/BI Dashboards for reporting (not external Power BI, Tableau)
- Serverless SQL warehouses for variable workloads
⚠️ Gap: External BI tools add cost and complexity; limited Databricks telemetry
8. Lakebase for OLTP
- Using Lakebase for transactional data (not external PostgreSQL, MySQL, SingleStore)
- Consolidated OLTP and analytical workloads on Databricks
⚠️ Gap: External OLTP databases split the stack and add operational overhead
9. Agent Bricks for AI/ML
- Using Databricks FM APIs for LLM serving (not external OpenAI, Anthropic)
- Using Vector Search for embeddings/RAG (not external Pinecone, Weaviate)
- Using Model Serving for custom ML models
- Using MLflow for ML lifecycle management
⚠️ Gap: External AI providers limit telemetry, governance, and cost visibility
10. Monitoring & Observability
- System tables queries for DBU consumption tracking
- Job/query monitoring with alerts
- Data quality monitoring (DLT expectations or Lakehouse Monitoring)
- Customer-specific dashboards for usage visibility
⚠️ Gap: No proactive monitoring leads to cost surprises and SLA breaches
📊 Architecture Maturity Assessment
Use this rubric to score partner architecture readiness:
| Category | Score | Criteria |
|---|---|---|
| Unity Catalog | 0-5 | 0 = No UC; 3 = Partial UC; 5 = 100% UC with full governance |
| Tagging | 0-5 | 0 = No tags; 3 = Manual tags; 5 = Automated tag enforcement |
| Authentication | 0-5 | 0 = PATs; 3 = Mix of PATs + OAuth; 5 = Full OAuth M2M |
| IaC | 0-5 | 0 = Manual; 3 = Partial IaC; 5 = Full automation with CI/CD |
| Databricks-Native Stack | 0-5 | 0 = Heavy external tools; 3 = Mix; 5 = 100% Databricks |
| Monitoring | 0-5 | 0 = None; 3 = Basic alerts; 5 = Comprehensive observability |
Total Score: ____ / 30
Maturity Level:
- 0-10: ❌ Not Ready — Significant architecture gaps, high risk
- 11-20: ⚠️ Needs Improvement — Some gaps, requires remediation plan
- 21-25: ✅ Good — Minor gaps, acceptable for GA
- 26-30: ✅✅ Excellent — Best-in-class architecture
🚨 Red Flags (Must Address Before Launch)
These are blocking issues that must be resolved:
- ❌ No Unity Catalog — All data must be governed by Unity Catalog
- ❌ No tagging — Customer cost attribution is impossible
- ❌ Using PATs — Security risk, not production-ready
- ❌ Manual customer onboarding — Won't scale past 10 customers
- ❌ No IaC — Infrastructure drift, repeatability issues
- ❌ No monitoring/alerting — Will miss SLA breaches and cost overruns
📋 Pre-Review Checklist
Before submitting your SOW for Databricks review, ensure:
- Full SOW Template completed
- Architecture diagram created (Lucidchart, Draw.io, Mermaid)
- All "Critical Requirements" (1-5) are met
- Red flags addressed or remediation plan documented
- Open questions clearly documented in SOW
- Team has reviewed internally
📞 Next Steps
- Complete Full SOW: Use the SOW Template
- Schedule Architecture Review: Contact your Partner Development Manager
- Prepare for Joint Review: Bring architecture diagram, technical lead, product manager
- Address Feedback: Iterate on SOW based on Databricks feedback
- Get Approval: Final sign-off before implementation begins
📚 Reference Documentation
- SOW Template (Full) — Complete architecture design document
- Cost Management Guide — Tagging and attribution
- Unity Catalog Governance — Catalog design patterns
- Deployment Models — Model selection guidance
- Partner Well-Architected Framework — Complete PWAF guidelines
🎯 Quick Decision Trees
Deployment Model Selection
Does customer data need to stay in their cloud account?
├─ YES → Hybrid, Side Car, or Customer Managed
│ └─ Do they have a Databricks platform team?
│ ├─ YES → Side Car or Customer Managed
│ └─ NO → Hybrid
└─ NO → Partner Hosted SaaS
└─ How many customers?
├─ < 50 → Workspace per tenant (simple)
├─ 50-500 → Multi-tenant workspace (efficient)
└─ > 500 → Multi-tenant workspace + schema per tenant
Unity Catalog Pattern Selection
How many customers per workspace?
├─ 1 customer (dedicated workspace) → Catalog per customer (Hub & Spoke)
└─ Multiple customers (shared workspace)
└─ Do customers need custom schema/tables?
├─ YES → Schema per tenant
└─ NO → Multi-tenant tables with RLS
Authentication Pattern Selection
How do end users access the product?
├─ Directly in Databricks → OAuth U2M (user login)
└─ Via your application
└─ Does your app act on behalf of users?
├─ YES → SSO-mapped OAuth (app gets tokens for users)
└─ NO → OAuth M2M (Service Principal for app backend)
Version: 1.0
Last Updated: 2026-04-07
Maintained By: Databricks Built-On Team