Data engineering
Data Engineering partners build ingestion, ETL/ELT, transformation, and orchestration tools. See the data ingestion and transformation patterns for foundational context.
Data ingestion products
Requirements
- Use Unity Catalog volumes as the default governed landing/staging zone for file drops. Volumes provide unified pathing and access control across clouds.
- See common patterns for file-based ingestion and streaming/CDC.
- Default targets must be Unity Catalog managed tables. Managed tables are governed and optimized, delivering predictive optimization, automatic maintenance, lower cost, faster performance, and interoperability.
- You may support external tables as an option when required, but managed should be your default.
Documentation: Unity Catalog Volumes | Working with Volume Files | Managed Tables
Data transformation products
Requirements
- Perform all transformations within Databricks to ensure optimal performance and governance. Structure layers following the medallion architecture (bronze, silver, gold).
- For incremental transformations, use Lakeflow SDP as the default. For procedural logic beyond SDP, use Structured Streaming with Lakeflow Jobs.
- For batch transformations, let Lakeflow Jobs handle orchestration, or orchestrate from your product if required.
- Use SQL, UDFs, AI functions, notebooks, or Databricks Connect based on your transformation needs.
See data transformation patterns for detailed implementation guidance.
Documentation: Medallion Architecture | Lakeflow SDP | Lakeflow Jobs
Reverse ETL products
Requirements
- Use SQL Warehouses to query data from Databricks. Recommend Serverless SQL Warehouses in your docs for better performance.
- Deliver curated datasets via Structured Streaming or SDP Sinks, or schedule Lakeflow Jobs for API pushes. Alternatively, expose governed datasets via managed tables and Unity Catalog's open APIs.
Documentation: Structured Streaming | SDP Sinks | Lakeflow Jobs | Catalogs API
Orchestration products
Requirements
- Use the Databricks REST API (and SDKs/CLI built on it) to programmatically orchestrate Databricks resources and runs.
Documentation: REST API
What's next
- Review the integration requirements for foundational guidance
- Learn about telemetry and attribution for usage tracking
- Explore other Partner product categories for additional integration patterns