DLT-META
Project Overview
DLT-META is a metadata-driven framework designed to work with Databricks Delta Live Tables (DLT). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.
In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow
DLT-META components:
Metadata Interface
- Capture input/output metadata in onboarding file
- Capture Data Quality Rules
- Capture processing logic as sql in Silver transformation file
Generic DLT pipeline
- Apply appropriate readers based on input metadata
- Apply data quality rules with DLT expectations
- Apply CDC apply changes if specified in metadata
- Builds DLT graph based on input/output metadata
- Launch DLT pipeline
High-Level Solution overview:
How does DLT-META work?
Onboarding Job
- Option#1: DLT-META CLI
- Option#2: Manual Job
- option#3: Databricks Notebook
Dataflow DLT Pipeline
- Option#1: DLT-META CLI
- Option#2: DLT-META MANUAL
DLT-META DLT Features support
Features | DLT-META Support |
---|---|
Input data sources | Autoloader, Delta, Eventhub, Kafka, snapshot |
Medallion architecture layers | Bronze, Silver |
Custom transformations | Bronze, Silver layer accepts custom functions |
Data Quality Expecations Support | Bronze, Silver layer |
Quarantine table support | Bronze layer |
apply_changes API support | Bronze, Silver layer |
apply_changes_from_snapshot API support | Bronze layer |
append_flow API support | Bronze layer |
Liquid cluster support | Bronze, Bronze Quarantine, Silver tables |
DLT-META CLI | databricks labs dlt-meta onboard , databricks labs dlt-meta deploy |
Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with layer=bronze_silver option using Direct publishing mode |
How much does it cost ?
DLT-META does not have any direct cost associated with it other than the cost to run the Databricks Delta Live Tables on your environment.The overall cost will be determined primarily by the [Databricks Delta Live Tables Pricing] (https://databricks.com/product/delta-live-tables-pricing-azure)
More questions
Refer to the FAQ
Getting Started
Refer to the Getting Started guide
Project Support
Please note that all projects in the databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.
Contributing
See our CONTRIBUTING for more details.