DLT-META Manual

OnboardJob

Option#1: using Databricks Python whl job

Go to your Databricks landing page and do one of the following:
In the sidebar, click Jobs Icon Workflows and click Create Job Button.
In the sidebar, click New Icon New and select Job from the menu.
In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example, Python wheel example.
In Task name, enter a name for the task, for example, dlt_meta_onboarding_pythonwheel_task.
In Type, select Python wheel.
In Package name, enter dlt_meta.
In Entry point, enter run.
Click Add under Dependent Libraries. In the Add dependent library dialog, under Library Type, click PyPI. Enter Package = dlt-meta
Click Add.
In Parameters, select keyword argument then select JSON. Past below json parameters with :

Without Unity Cataglog

    {                   
        "onboard_layer": "bronze_silver",
        "database": "dlt_demo",
        "onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
        "silver_dataflowspec_table": "silver_dataflowspec_table",
        "silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver",
        "bronze_dataflowspec_table": "bronze_dataflowspec_table",
        "import_author": "Ravi",
        "version": "v1",
        "bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze",
        "onboard_layer": "bronze_silver",
        "uc_enabled": "False",
        "overwrite": "True",
        "env": "dev"
    }

with Unity catalog

    {                   
        "onboard_layer": "bronze_silver",
        "database": "uc_name.dlt_demo",
        "onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
        "silver_dataflowspec_table": "silver_dataflowspec_table",
        "bronze_dataflowspec_table": "bronze_dataflowspec_table",
        "import_author": "Ravi",
        "version": "v1",
        "uc_enabled": "True",
        "overwrite": "True",
        "env": "dev"
    }

Note in database field you need to provide catalog name then schema name as <<uc_name>>.<<schema>>

Alternatly you can enter keyword arguments, click + Add and enter a key and value. Click + Add again to enter more arguments.

Click Save task.
Run now
Make sure job run successfully. Verify metadata in your dataflow spec tables entered in step: 11 e.g dlt_demo.bronze_dataflowspec_table , dlt_demo.silver_dataflowspec_table

Option#2: Databricks Notebook

Copy below code to databricks notebook cells %pip install dlt-meta

without unity catalog

onboarding_params_map = {
		"database": "dlt_demo",
		"onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
		"bronze_dataflowspec_table": "bronze_dataflowspec_table", 
		"bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze",                       
		"silver_dataflowspec_table": "silver_dataflowspec_table",
		"silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver",
		"overwrite": "True",
		"env": "dev",
		"version": "v1",
		"import_author": "Ravi"
		}

from src.onboard_dataflowspec import OnboardDataflowspec
OnboardDataflowspec(spark, onboarding_params_map).onboard_dataflow_specs()

with unity catalog

onboarding_params_map = {
		"database": "uc_name.dlt_demo",
		"onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",,
		"bronze_dataflowspec_table": "bronze_dataflowspec_table", 
		"silver_dataflowspec_table": "silver_dataflowspec_table",
		"overwrite": "True",
		"env": "dev",
		"version": "v1",
		"import_author": "Ravi"
		}

from src.onboard_dataflowspec import OnboardDataflowspec
OnboardDataflowspec(spark, onboarding_params_map, uc_enabled=True).onboard_dataflow_specs()

Specify your onboarding config params in above onboarding_params_map
Run notebook cells

Dataflow DLT Pipeline:

Delta Live Tables launch notebook

Go to your Databricks landing page and select Create a notebook, or click New Icon New in the sidebar and select Notebook. The Create Notebook dialog appears.
In the Create Notebook dialogue, give your notebook a name e.g dlt_meta_pipeline and select Python from the Default Language dropdown menu. You can leave Cluster set to the default value. The Delta Live Tables runtime creates a cluster before it runs your pipeline.
Click Create.

You can add the example dlt pipeline code or import iPython notebook as is.

    %pip install dlt-meta

    layer = spark.conf.get("layer", None)
    from src.dataflow_pipeline import DataflowPipeline
    DataflowPipeline.invoke_dlt_pipeline(spark, layer)

Create Bronze DLT pipeline

Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
Give the pipeline a name e.g. DLT_META_BRONZE and click File Picker Icon to select a notebook dlt_meta_pipeline created in step: Create a dlt launch notebook.
Optionally enter a storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.
Select Triggered for Pipeline Mode.

Enter Configuration parameters e.g.

"layer": "bronze",
"bronze.dataflowspecTable": "dataflowspec table name",
"bronze.group": "enter group name from metadata e.g. G1",

Enter target schema where you wants your bronze tables to be created
Click Create.
Start pipeline: click the Start button on in top panel. The system returns a message confirming that your pipeline is starting

Create Silver DLT pipeline

Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
Give the pipeline a name e.g. DLT_META_SILVER and click File Picker Icon to select a notebook dlt_meta_pipeline created in step: Create a dlt launch notebook.
Optionally enter a storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.
Select Triggered for Pipeline Mode.

Enter Configuration parameters e.g.

"layer": "silver",
"silver.dataflowspecTable": "dataflowspec table name",
"silver.group": "enter group name from metadata e.g. G1",

Enter target schema where you wants your silver tables to be created
Click Create.
Start pipeline: click the Start button on in top panel. The system returns a message confirming that your pipeline is starting