AI/ML
AI/ML partners build data annotation, document processing, enterprise search, AI agents, observability, and security solutions. See the AI capabilities patterns for foundational context.
Core requirements for integration
Partners should follow these core requirements:
Govern everything with Unity Catalog
All data, models, vector indexes, features, functions, and agent tools (e.g. functions/tools used to access sensitive information) must be registered in Unity Catalog for consistent governance (access control, lineage, audit, and others).
Documentation: What is Unity Catalog? | Data and AI governance for the data lakehouse
Register models with MLflow to Unity Catalog
All models including classic ML models, LLMs, embeddings models, and code-based agents should be logged and registered using MLflow + Unity Catalog for versioning, governance, and deployment with Mosaic AI Model Serving. Partners are also recommended to use the Mosaic AI Agent Framework for building Python-based agents using popular OSS frameworks.
Documentation: Log and register AI agents | MLflow for ML model lifecycle | MLflow 3 for GenAI
Use Databricks-hosted foundational models first
Databricks hosts open foundation models (Meta Llama, GPT OSS, Google Gemma) and proprietary models (OpenAI, Anthropic Claude, Google Gemini) through the Foundation Model APIs, with governed REST and OpenAI-compatible access. Partners can also configure external or deploy custom models through Model Serving.
Documentation: Databricks Foundation Model APIs | Supported foundation models on Mosaic AI Model Serving | Databricks-hosted foundation models available in Foundation Model APIs
Use standard APIs for all interactions
Integrations should rely on SQL (AI Functions), Python SDKs, REST, OpenAI-compatible APIs, MLflow, and MCP for agent and tool interoperability.
Use Genie for natural-language analytics and agent workflows
Partners can configure Genie Spaces on curated Unity Catalog datasets and invoke Genie via the Genie APIs or as a tool through Databricks MCP servers, enabling integration into multi-agent workflows orchestrated by the Multi-Agent Supervisors.
Documentation: What is an AI/BI Genie space | Use Genie in multi-agent systems | Use the Genie API to integrate Genie into your applications | Use Databricks managed MCP servers
Connect external agents to Databricks via Managed MCP
Agents running outside Databricks should call Databricks capabilities through Databricks Managed MCP services. Managed MCP currently supports:
- Vector Search
- Genie
- Unity Catalog Functions
- DBSQL
Documentation: Use Databricks managed MCP servers
Publish MCP servers to Databricks Marketplace
Partners with agentic tools or APIs can publish MCP servers to the Databricks Marketplace, making them discoverable and installable by joint customers. Published MCP servers integrate directly with Databricks-hosted agents and AI Gateway, enabling seamless tool invocation without custom integration code.
Documentation: MCP Marketplace Validation | External MCP servers
Use Databricks-native AI infrastructure
Use Foundational models, AI Functions, Vector Search, Model Serving, Genie, Feature Store, and MLflow as the primary building blocks for inference, retrieval, enrichment, agentic workflows, agents, and AI applications.
Integration scenarios and recommended patterns
Data annotation / labeling
Annotation partners read data from Databricks, apply labels/annotations, and write results back into Databricks.
Integration principles
- Read input data from Unity Catalog:
- Tables for structured data and Volumes for unstructured files (PDFs, images, audio, etc.)
- Access via SQL, Databricks SQL REST APIs, SDKs, JDBC/ODBC, or connectors.
- Write labeled output back into Unity Catalog Delta tables using SQL (
PUT,MERGE,COPY INTO) or APIs/SDKs to maintain governance and lineage. - Use consistent storage patterns:
- Raw files into Volumes
- Extracted text, metadata, labels/annotations/text into Tables
Document AI
Document AI partners process unstructured documents, extract metadata, and generate structured output.
Integration principles
- Use Databricks foundational and embedding models through Foundation Model APIs or Model Serving.
- Store data consistently under Unity Catalog:
- Raw files and text/json files into Volumes
- Extracted text/metadata/classifications into Tables
- Support search or RAG or AI Agents by generating embeddings and writing them to Mosaic AI Vector Search via Delta Sync or Direct Access.
Enterprise search & AI agents
Enterprise Search and Agent partners index enterprise knowledge and orchestrate agentic workflows.
Integration principles
- Index metadata by querying Unity Catalog REST APIs and system tables for schemas, descriptions, tags, permissions, users, and lineage.
- Call Genie via API for governed natural language to SQL analytics.
- Use Managed MCP to call Genie, Vector Search, Unity Catalog Functions, and DBSQL.
- Expose partner-hosted agents/tools via MCP to integrate with AI Agents on Databricks.
AI observability
AI Observability partners help customers trace, monitor, and analyze LLM and agent behavior. Integrations should standardize on MLflow Tracing to provide consistent, governed visibility across GenAI workflows.
Integration principles
- Push raw traces into Databricks tables in OpenTelemetry format. Partners should offer an export path that writes OTel spans/events into Unity Catalog–governed Delta tables, so partners and customers can query the same governed trace store with SQL to power in-product experiences (search, debugging, and reporting) while inheriting enterprise-grade access controls, lineage, and retention.
- Provide an autologging / auto-tracing library that emits MLflow Tracing data. The library should capture prompts, responses, tool calls, retrieval steps, and agent reasoning, and offer a simple API (e.g.,
autolog()) for minimal-code integration. - Recommend dual logging when using OpenTelemetry. Partners who support OpenTelemetry (OTel) should enable dual logging, forwarding spans both to their OTel collector and to MLflow Tracing.
- Store aggregated observability signals in Unity Catalog. Evaluation results, metrics, summaries, and enriched signals should be written to Unity Catalog governed Delta tables to ensure secure access, lineage, and analytics.
Documentation: Contributing to MLflow Tracing | MLflow 3 for GenAI | MLflow Tracing Integrations | OpenTelemetry Export
AI security & governance
AI Security & Governance partners provide guardrails, safety scoring, moderation, classification, and compliance enforcement for LLMs and agents. Integrations should attach to agent execution at well-defined control points and make use of Databricks' governed model serving and AI Gateway capabilities.
Integration principles
- Provide wrapper libraries with pre/post execution hooks for major agent frameworks. Partners should offer a lightweight wrapper library that customers can import into code-based agents (built on LangChain, LangGraph, DSPy, or the OpenAI SDK). This wrapper should expose partner guardrails through the hook mechanisms each framework natively supports:
- LangChain: Callbacks (CallbackHandlers)
- LangGraph: Middleware
- DSPy: Tracers
- OpenAI Python SDK: RunHooks
- Use Custom Guardrails in Databricks AI Gateway (Preview). Databricks Custom Guardrails allow customers and partners to attach guardrail logic directly to Foundation Model endpoints served through AI Gateway, without modifying application code.
- Store governance and moderation outputs in Unity Catalog. Security and governance outputs (scores, classifications, filter decisions, risk levels, policy enforcement outcomes) should be written to Unity Catalog managed tables to provide audit trails, compliance reporting, and governance.
What's next
- Review the integration requirements for foundational guidance
- Learn about telemetry and attribution for usage tracking
- Explore other Partner product categories for additional integration patterns