DLT-META

Project Overview

DLT-META is a metadata-driven framework designed to work with Databricks Delta Live Tables (DLT). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.

In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow

DLT-META components:

Metadata Interface

Generic DLT pipeline

  • Apply appropriate readers based on input metadata
  • Apply data quality rules with DLT expectations
  • Apply CDC apply changes if specified in metadata
  • Builds DLT graph based on input/output metadata
  • Launch DLT pipeline

High-Level Solution overview:

High-Level Process Flow High-Level Process Flow

How does DLT-META work?

DLT-META Stages DLT-META Stages

DLT-META DLT Features support

FeaturesDLT-META Support
Input data sourcesAutoloader, Delta, Eventhub, Kafka, snapshot
Medallion architecture layersBronze, Silver
Custom transformationsBronze, Silver layer accepts custom functions
Data Quality Expecations SupportBronze, Silver layer
Quarantine table supportBronze layer
apply_changes API supportBronze, Silver layer
apply_changes_from_snapshot API supportBronze layer
append_flow API supportBronze layer
Liquid cluster supportBronze, Bronze Quarantine, Silver tables
DLT-META CLIdatabricks labs dlt-meta onboard, databricks labs dlt-meta deploy
Bronze and Silver pipeline chainingDeploy dlt-meta pipeline with layer=bronze_silver option using Direct publishing mode

How much does it cost ?

DLT-META does not have any direct cost associated with it other than the cost to run the Databricks Delta Live Tables on your environment.The overall cost will be determined primarily by the [Databricks Delta Live Tables Pricing] (https://databricks.com/product/delta-live-tables-pricing-azure)

More questions

Refer to the FAQ

Getting Started

Refer to the Getting Started guide

Project Support

Please note that all projects in the databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Contributing

See our CONTRIBUTING for more details.