AI readiness
When your data is self-describing and semantically rich, it becomes far easier for both humans and AI systems to interpret and apply it effectively. This page provides a comprehensive checklist for making your shared data AI-ready, with a focus on Databricks Genie and other AI-powered tools.
Why AI readiness matters
Good metadata and semantic clarity enable:
- Natural language querying: Users can ask questions in plain English and get accurate SQL
- Reduced ambiguity: AI tools understand exactly what each column and table represents
- Faster onboarding: New consumers can explore and understand your data without extensive documentation
- Higher trust: Verified answers and clear definitions build confidence in query results
AI readiness checklist
Core metadata
- Table descriptions — Every table has a description explaining its purpose and primary use case
- Column descriptions — Every column has a description with meaning, units, and format where applicable
- Clear naming — Column names are unambiguous (avoid generic names like
value,data,type) - Proper data types — Dates and numbers use proper types (not strings)
- Tags — Tags applied for discoverability (domain, sensitivity, refresh cadence)
Semantic clarity
- Primary keys — Primary keys explicitly defined for each table
- Relationships — Foreign key relationships documented (join paths between tables)
- Table grain — Grain explicitly stated (one row = one customer, one transaction, etc.)
- Aggregation rules — Document how metrics should be aggregated (SUM, AVG, COUNT DISTINCT)
- Time zones — Time zone handling documented for all timestamp columns
Unity Catalog metrics
Unity Catalog metric views allow you to define reusable business calculations that can be referenced consistently across queries and tools.
- Key metrics defined — Business metrics (revenue, churn, LTV) defined in Unity Catalog
- Calculation logic — Formulas, filters, and exclusions documented for each metric
- Consistent usage — Metrics used instead of ad-hoc calculations in downstream queries
UC metrics are not Delta-sharable—distribute definitions via Volumes or Asset Bundles.
Genie configuration
Genie Spaces allow you to configure how Genie interacts with your data.
- Genie Space created — Space configured with relevant tables for your data product
- Sample questions — Common questions and verified SQL answers provided
- General instructions — Domain-specific logic documented (e.g., "always filter by is_active = true")
- Trusted assets — Priority tables and views marked as trusted
- Terminology — Domain-specific terms mapped to table/column names
Genie configurations are not Delta-sharable—distribute configs and sample questions via Volumes or Asset Bundles.
Data quality signals
- Freshness indicator —
last_updatedor similar timestamp available - Completeness — Expected completeness documented (e.g., "99% of rows have email")
- Known limitations — Data gaps or known issues documented
- Expectations — Data quality expectations defined (DLT expectations, constraints)
Enablement
- Sample notebooks — Notebooks demonstrating common queries and use cases
- Verified queries — Known-good SQL for common business questions
- Join patterns — Recommended join paths documented
- Edge cases — Tricky scenarios and their solutions documented
Business context
Beyond technical metadata, AI tools benefit from business context:
- Glossary alignment: Define what terms mean in your domain (e.g., "customer" = active account, not trial)
- Calculation definitions: Document business logic (e.g., churn = no activity in 90 days)
- Time period conventions: Clarify fiscal year vs calendar year, reporting periods
- Regional considerations: Note when data is segmented by geography or business unit
Quick start
If you're just getting started, prioritize these five items:
- Table descriptions - What is this table and what is it for?
- Column descriptions - What does each column mean?
- Primary keys - How do I uniquely identify a row?
- Table grain - What does one row represent?
- Sample queries - Show me how to use this data
These five items will dramatically improve AI tool performance and consumer onboarding.
What's next
- Return to data products for productization best practices
- Learn about share types you can offer
- Set up dynamic views for fine-grained access control