AI/ML

AI/ML partners build data annotation, document processing, enterprise search, AI agents, observability, and security solutions. See the AI capabilities patterns for foundational context.

Core requirements for integration

Partners should follow these core requirements:

Govern everything with Unity Catalog

All data, models, vector indexes, features, functions, and agent tools (e.g. functions/tools used to access sensitive information) must be registered in Unity Catalog for consistent governance (access control, lineage, audit, and others).

Documentation: What is Unity Catalog? | Data and AI governance for the data lakehouse

Register models with MLflow to Unity Catalog

All models including classic ML models, LLMs, embeddings models, and code-based agents should be logged and registered using MLflow + Unity Catalog for versioning, governance, and deployment with Mosaic AI Model Serving. Partners are also recommended to use the Mosaic AI Agent Framework for building Python-based agents using popular OSS frameworks.

Documentation: Log and register AI agents | MLflow for ML model lifecycle | MLflow 3 for GenAI

Use Databricks-hosted foundational models first

Databricks hosts open foundation models (Meta Llama, GPT OSS, Google Gemma) and proprietary models (OpenAI, Anthropic Claude, Google Gemini) through the Foundation Model APIs, with governed REST and OpenAI-compatible access. Partners can also configure external or deploy custom models through Model Serving.

Documentation: Databricks Foundation Model APIs | Supported foundation models on Mosaic AI Model Serving | Databricks-hosted foundation models available in Foundation Model APIs

Use standard APIs for all interactions

Integrations should rely on SQL (AI Functions), Python SDKs, REST, OpenAI-compatible APIs, MLflow, and MCP for agent and tool interoperability.

Use Genie for natural-language analytics and agent workflows

Partners can configure Genie Spaces on curated Unity Catalog datasets and invoke Genie via the Genie APIs or as a tool through Databricks MCP servers, enabling integration into multi-agent workflows orchestrated by the Multi-Agent Supervisors.

Documentation: What is an AI/BI Genie space | Use Genie in multi-agent systems | Use the Genie API to integrate Genie into your applications | Use Databricks managed MCP servers

Connect external agents to Databricks via Managed MCP

Agents running outside Databricks should call Databricks capabilities through Databricks Managed MCP services. Managed MCP currently supports:

Vector Search
Genie
Unity Catalog Functions
DBSQL

Documentation: Use Databricks managed MCP servers

Publish MCP servers to Databricks Marketplace

Partners with agentic tools or APIs can publish MCP servers to the Databricks Marketplace, making them discoverable and installable by joint customers. Published MCP servers integrate directly with Databricks-hosted agents and AI Gateway, enabling seamless tool invocation without custom integration code.

Documentation: MCP Marketplace Validation | External MCP servers

Use Databricks-native AI infrastructure

Use Foundational models, AI Functions, Vector Search, Model Serving, Genie, Feature Store, and MLflow as the primary building blocks for inference, retrieval, enrichment, agentic workflows, agents, and AI applications.

Integration scenarios and recommended patterns

Data annotation / labeling

Annotation partners read data from Databricks, apply labels/annotations, and write results back into Databricks.

Integration principles

Read input data from Unity Catalog:
- Tables for structured data and Volumes for unstructured files (PDFs, images, audio, etc.)
- Access via SQL, Databricks SQL REST APIs, SDKs, JDBC/ODBC, or connectors.
Write labeled output back into Unity Catalog Delta tables using SQL (PUT, MERGE, COPY INTO) or APIs/SDKs to maintain governance and lineage.
Use consistent storage patterns:
- Raw files into Volumes
- Extracted text, metadata, labels/annotations/text into Tables

Document AI

Document AI partners process unstructured documents, extract metadata, and generate structured output.

Integration principles

Use Databricks foundational and embedding models through Foundation Model APIs or Model Serving.
Store data consistently under Unity Catalog:
- Raw files and text/json files into Volumes
- Extracted text/metadata/classifications into Tables
Support search or RAG or AI Agents by generating embeddings and writing them to Mosaic AI Vector Search via Delta Sync or Direct Access.

Enterprise search & AI agents

Enterprise Search and Agent partners index enterprise knowledge and orchestrate agentic workflows.

Integration principles

Index metadata by querying Unity Catalog REST APIs and system tables for schemas, descriptions, tags, permissions, users, and lineage.
Call Genie via API for governed natural language to SQL analytics.
Use Managed MCP to call Genie, Vector Search, Unity Catalog Functions, and DBSQL.
Expose partner-hosted agents/tools via MCP to integrate with AI Agents on Databricks.

AI observability

AI Observability partners help customers trace, monitor, and analyze LLM and agent behavior. Integrations should standardize on MLflow Tracing to provide consistent, governed visibility across GenAI workflows.

Integration principles

Push raw traces into Databricks tables in OpenTelemetry format. Partners should offer an export path that writes OTel spans/events into Unity Catalog–governed Delta tables, so partners and customers can query the same governed trace store with SQL to power in-product experiences (search, debugging, and reporting) while inheriting enterprise-grade access controls, lineage, and retention.
Provide an autologging / auto-tracing library that emits MLflow Tracing data. The library should capture prompts, responses, tool calls, retrieval steps, and agent reasoning, and offer a simple API (e.g., autolog()) for minimal-code integration.
Recommend dual logging when using OpenTelemetry. Partners who support OpenTelemetry (OTel) should enable dual logging, forwarding spans both to their OTel collector and to MLflow Tracing.
Store aggregated observability signals in Unity Catalog. Evaluation results, metrics, summaries, and enriched signals should be written to Unity Catalog governed Delta tables to ensure secure access, lineage, and analytics.

Documentation: Contributing to MLflow Tracing | MLflow 3 for GenAI | MLflow Tracing Integrations | OpenTelemetry Export

AI security & governance

AI Security & Governance partners provide guardrails, safety scoring, moderation, classification, and compliance enforcement for LLMs and agents. Integrations should attach to agent execution at well-defined control points and make use of Databricks' governed model serving and AI Gateway capabilities.

Integration principles

Provide wrapper libraries with pre/post execution hooks for major agent frameworks. Partners should offer a lightweight wrapper library that customers can import into code-based agents (built on LangChain, LangGraph, DSPy, or the OpenAI SDK). This wrapper should expose partner guardrails through the hook mechanisms each framework natively supports:
- LangChain: Callbacks (CallbackHandlers)
- LangGraph: Middleware
- DSPy: Tracers
- OpenAI Python SDK: RunHooks
Use Custom Guardrails in Databricks AI Gateway (Preview). Databricks Custom Guardrails allow customers and partners to attach guardrail logic directly to Foundation Model endpoints served through AI Gateway, without modifying application code.
Store governance and moderation outputs in Unity Catalog. Security and governance outputs (scores, classifications, filter decisions, risk levels, policy enforcement outcomes) should be written to Unity Catalog managed tables to provide audit trails, compliance reporting, and governance.

What's next

Review the integration requirements for foundational guidance
Learn about telemetry and attribution for usage tracking
Explore other Partner product categories for additional integration patterns

Core requirements for integration​

Govern everything with Unity Catalog​

Register models with MLflow to Unity Catalog​

Use Databricks-hosted foundational models first​

Use standard APIs for all interactions​

Use Genie for natural-language analytics and agent workflows​

Connect external agents to Databricks via Managed MCP​

Publish MCP servers to Databricks Marketplace​

Use Databricks-native AI infrastructure​

Integration scenarios and recommended patterns​

Data annotation / labeling​

Document AI​

Enterprise search & AI agents​

AI observability​

AI security & governance​

What's next​

Core requirements for integration

Govern everything with Unity Catalog

Register models with MLflow to Unity Catalog

Use Databricks-hosted foundational models first

Use standard APIs for all interactions

Use Genie for natural-language analytics and agent workflows

Connect external agents to Databricks via Managed MCP

Publish MCP servers to Databricks Marketplace

Use Databricks-native AI infrastructure

Integration scenarios and recommended patterns

Data annotation / labeling

Document AI

Enterprise search & AI agents

AI observability

AI security & governance

What's next