Go to your Databricks landing page and do one of the following:
In the sidebar, click Jobs Icon Workflows and click Create Job Button.
In the sidebar, click New Icon New and select Job from the menu.
In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example, Python wheel example.
In Task name, enter a name for the task, for example, dlt_meta_onboarding_pythonwheel_task
.
In Type, select Python wheel.
In Package name, enter dlt_meta
.
In Entry point, enter run
.
Click Add under Dependent Libraries. In the Add dependent library dialog, under Library Type, click PyPI. Enter Package = dlt-meta
Click Add.
In Parameters, select keyword argument then select JSON. Past below json parameters with :
{
"onboard_layer": "bronze_silver",
"database": "dlt_demo",
"onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
"silver_dataflowspec_table": "silver_dataflowspec_table",
"silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver",
"bronze_dataflowspec_table": "bronze_dataflowspec_table",
"import_author": "Ravi",
"version": "v1",
"bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze",
"onboard_layer": "bronze_silver",
"uc_enabled": "False",
"overwrite": "True",
"env": "dev"
}
{
"onboard_layer": "bronze_silver",
"database": "uc_name.dlt_demo",
"onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
"silver_dataflowspec_table": "silver_dataflowspec_table",
"bronze_dataflowspec_table": "bronze_dataflowspec_table",
"import_author": "Ravi",
"version": "v1",
"uc_enabled": "True",
"overwrite": "True",
"env": "dev"
}
<<uc_name>>.<<schema>>
Alternatly you can enter keyword arguments, click + Add and enter a key and value. Click + Add again to enter more arguments.
Click Save task.
Run now
Make sure job run successfully. Verify metadata in your dataflow spec tables entered in step: 11 e.g dlt_demo.bronze_dataflowspec_table
, dlt_demo.silver_dataflowspec_table
%pip install dlt-meta
onboarding_params_map = {
"database": "dlt_demo",
"onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
"bronze_dataflowspec_table": "bronze_dataflowspec_table",
"bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze",
"silver_dataflowspec_table": "silver_dataflowspec_table",
"silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver",
"overwrite": "True",
"env": "dev",
"version": "v1",
"import_author": "Ravi"
}
from src.onboard_dataflowspec import OnboardDataflowspec
OnboardDataflowspec(spark, onboarding_params_map).onboard_dataflow_specs()
onboarding_params_map = {
"database": "uc_name.dlt_demo",
"onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",,
"bronze_dataflowspec_table": "bronze_dataflowspec_table",
"silver_dataflowspec_table": "silver_dataflowspec_table",
"overwrite": "True",
"env": "dev",
"version": "v1",
"import_author": "Ravi"
}
from src.onboard_dataflowspec import OnboardDataflowspec
OnboardDataflowspec(spark, onboarding_params_map, uc_enabled=True).onboard_dataflow_specs()
Specify your onboarding config params in above onboarding_params_map
Run notebook cells
Go to your Databricks landing page and select Create a notebook, or click New Icon New in the sidebar and select Notebook. The Create Notebook dialog appears.
In the Create Notebook dialogue, give your notebook a name e.g dlt_meta_pipeline
and select Python from the Default Language dropdown menu. You can leave Cluster set to the default value. The Delta Live Tables runtime creates a cluster before it runs your pipeline.
Click Create.
You can add the example dlt pipeline code or import iPython notebook as is.
%pip install dlt-meta
layer = spark.conf.get("layer", None)
from src.dataflow_pipeline import DataflowPipeline
DataflowPipeline.invoke_dlt_pipeline(spark, layer)
Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
Give the pipeline a name e.g. DLT_META_BRONZE and click File Picker Icon to select a notebook dlt_meta_pipeline
created in step: Create a dlt launch notebook
.
Optionally enter a storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.
Select Triggered for Pipeline Mode.
Enter Configuration parameters e.g.
"layer": "bronze",
"bronze.dataflowspecTable": "dataflowspec table name",
"bronze.group": "enter group name from metadata e.g. G1",
Enter target schema where you wants your bronze tables to be created
Click Create.
Start pipeline: click the Start button on in top panel. The system returns a message confirming that your pipeline is starting
Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
Give the pipeline a name e.g. DLT_META_SILVER and click File Picker Icon to select a notebook dlt_meta_pipeline
created in step: Create a dlt launch notebook
.
Optionally enter a storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.
Select Triggered for Pipeline Mode.
Enter Configuration parameters e.g.
"layer": "silver",
"bronze.dataflowspecTable": "dataflowspec table name",
"bronze.group": "enter group name from metadata e.g. G1",
Enter target schema where you wants your silver tables to be created
Click Create.
Start pipeline: click the Start button on in top panel. The system returns a message confirming that your pipeline is starting