DLT-META

Project Overview

DLT-META is a metadata-driven framework designed to work with Databricks Lakeflow Declarative Pipelines . This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.

In practice, a single generic pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow

DLT-META components:

Metadata Interface

Generic Lakeflow Declarative pipeline

  • Apply appropriate readers based on input metadata
  • Apply data quality rules with Lakeflow Declarative Pipelines expectations
  • Apply CDC apply changes if specified in metadata
  • Builds Lakeflow Declarative Pipelines graph based on input/output metadata
  • Launch Lakeflow Declarative Pipelines pipeline

High-Level Solution overview:

High-Level Process Flow High-Level Process Flow

How does DLT-META work?

DLT-META Stages DLT-META Stages

DLT-META DLT Features support

FeaturesDLT-META Support
Input data sourcesAutoloader, Delta, Eventhub, Kafka, snapshot
Medallion architecture layersBronze, Silver
Custom transformationsBronze, Silver layer accepts custom functions
Data Quality Expecations SupportBronze, Silver layer
Quarantine table supportBronze layer
create_auto_cdc_flow API supportBronze, Silver layer
create_auto_cdc_from_snapshot_flow API supportBronze layer
append_flow API supportBronze layer
Liquid cluster supportBronze, Bronze Quarantine, Silver, Silver Quarantine tables
DLT-META CLIdatabricks labs dlt-meta onboard, databricks labs dlt-meta deploy
Bronze and Silver pipeline chainingDeploy dlt-meta pipeline with layer=bronze_silver option using Direct publishing mode
DLT SinksSupported formats:external delta table, kafka.Bronze, Silver layers

How much does it cost ?

DLT-META does not have any direct cost associated with it other than the cost to run the Databricks Lakeflow Declarative Pipelines on your environment.The overall cost will be determined primarily by the [Databricks Lakeflow Declarative Pipelines Pricing] (https://www.databricks.com/product/pricing/lakeflow-declarative-pipelines)

More questions

Refer to the FAQ

Getting Started

Refer to the Getting Started guide

Project Support

Please note that all projects in the databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Contributing

See our CONTRIBUTING for more details.