Data engineering

Data Engineering partners build ingestion, ETL/ELT, transformation, and orchestration tools. See the data ingestion and transformation patterns for foundational context.

Data ingestion products

Requirements

Use Unity Catalog volumes as the default governed landing/staging zone for file drops. Volumes provide unified pathing and access control across clouds.
See common patterns for file-based ingestion and streaming/CDC.
Default targets must be Unity Catalog managed tables. Managed tables are governed and optimized, delivering predictive optimization, automatic maintenance, lower cost, faster performance, and interoperability.
- You may support external tables as an option when required, but managed should be your default.

Documentation: Unity Catalog Volumes | Working with Volume Files | Managed Tables

Data transformation products

Requirements

Perform all transformations within Databricks to ensure optimal performance and governance. Structure layers following the medallion architecture (bronze, silver, gold).
For incremental transformations, use Lakeflow SDP as the default. For procedural logic beyond SDP, use Structured Streaming with Lakeflow Jobs.
For batch transformations, let Lakeflow Jobs handle orchestration, or orchestrate from your product if required.
Use SQL, UDFs, AI functions, notebooks, or Databricks Connect based on your transformation needs.

See data transformation patterns for detailed implementation guidance.

Documentation: Medallion Architecture | Lakeflow SDP | Lakeflow Jobs

Reverse ETL products

Requirements

Use SQL Warehouses to query data from Databricks. Recommend Serverless SQL Warehouses in your docs for better performance.
Deliver curated datasets via Structured Streaming or SDP Sinks, or schedule Lakeflow Jobs for API pushes. Alternatively, expose governed datasets via managed tables and Unity Catalog's open APIs.

Documentation: Structured Streaming | SDP Sinks | Lakeflow Jobs | Catalogs API

Orchestration products

Requirements

Use the Databricks REST API (and SDKs/CLI built on it) to programmatically orchestrate Databricks resources and runs.

Documentation: REST API

What's next

Review the integration requirements for foundational guidance
Learn about telemetry and attribution for usage tracking
Explore other Partner product categories for additional integration patterns

Data ingestion products​

Requirements​

Data transformation products​

Requirements​

Reverse ETL products​

Requirements​

Orchestration products​

Requirements​

What's next​

Data ingestion products

Requirements

Data transformation products

Requirements

Reverse ETL products

Requirements

Orchestration products

Requirements

What's next