Execution

Q. How do I get started ?

Please refer to the Getting Started guide

Q. How do I create metadata DLT-META ?

DLT-META needs following metadata files:

Q. What is DataflowSpecs?

DLT-META translates input metadata into Delta table as DataflowSpecs

Q. How many DLT pipelines will be launched using DLT-META?

DLT-META uses data_flow_group to launch DLT pipelines, so all the tables belongs to same group will be executed under single DLT pipeline.

Q. Can we run onboarding for bronze layer only?

Yes! Please follow below steps:

  1. Bronze Metadata preparation (example)
  2. Onboarding Job
    {                   
            "onboard_layer": "bronze",
            "database": "dlt_demo",
            "onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
            "bronze_dataflowspec_table": "bronze_dataflowspec_table",
            "import_author": "Ravi",
            "version": "v1",
            "uc_enabled": "True",
            "overwrite": "True",
            "env": "dev"
    } 
    
        onboarding_params_map = {
                "database": "uc_name.dlt_demo",
                "onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
                "bronze_dataflowspec_table": "bronze_dataflowspec_table", 
                "overwrite": "True",
                "env": "dev",
                "version": "v1",
                "import_author": "Ravi"
                }

        from src.onboard_dataflowspec import OnboardDataflowspec
        OnboardDataflowspec(spark, onboarding_params_map, uc_enabled=True).onboard_bronze_dataflow_spec()

Q. Can we run onboarding for silver layer only? Yes! Please follow below steps:

  1. Bronze Metadata preparation (example)
  2. Onboarding Job
    {                   
            "onboard_layer": "silver",
            "database": "dlt_demo",
            "onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
            "silver_dataflowspec_table": "silver_dataflowspec_table",
            "import_author": "Ravi",
            "version": "v1",
            "uc_enabled": "True",
            "overwrite": "True",
            "env": "dev"
    } 
    
        onboarding_params_map = {
                "database": "uc_name.dlt_demo",
                "onboarding_file_path": "dbfs:/dlt-meta/conf/onboarding.json",
                "silver_dataflowspec_table": "silver_dataflowspec_table", 
                "overwrite": "True",
                "env": "dev",
                "version": "v1",
                "import_author": "Ravi"
                }

        from src.onboard_dataflowspec import OnboardDataflowspec
        OnboardDataflowspec(spark, onboarding_params_map, uc_enabled=True).onboard_silver_dataflow_spec()

Q. How to chain multiple silver tables after bronze table?

  • Example: After customers_cdc bronze table, can I have customers silver table reading from customers_cdc and another customers_clean silver table reading from customers_cdc? If so, how do I define these in onboarding.json?

  • You can run onboarding for additional silver customer_clean table by having onboarding file and silver transformation with filter condition for fan out.

  • Run onboarding for slilver layer in append mode(“overwrite”: “False”) so it will append to existing silver tables. When you launch DLT pipeline it will read silver onboarding and run DLT for bronze source and silver as target

Q. How can I do type1 or type2 merge to target table?

  • Using DLT’s dlt.apply_changes we can do type1 or type2 merge.
  • DLT-META have tag in onboarding file as bronze_cdc_apply_changes or silver_cdc_apply_changes which maps to DLT’s apply_changes API.
"silver_cdc_apply_changes": {
   "keys":[
      "customer_id"
   ],
   "sequence_by":"dmsTimestamp",
   "scd_type":"2",
   "apply_as_deletes":"Op = 'D'",
   "except_column_list":[
      "Op",
      "dmsTimestamp",
      "_rescued_data"
   ]
}

Q. How can I write to same target table using different sources?

[
   {
      "name":"customer_bronze_flow1",
      "create_streaming_table":false,
      "source_format":"cloudFiles",
      "source_details":{
         "source_path_dev":"tests/resources/data/customers",
         "source_schema_path":"tests/resources/schema/customer_schema.ddl"
      },
      "reader_options":{
         "cloudFiles.format":"json",
         "cloudFiles.inferColumnTypes":"true",
         "cloudFiles.rescuedDataColumn":"_rescued_data"
      },
      "once":true
   },
   {
      "name":"customer_bronze_flow2",
      "create_streaming_table":false,
      "source_format":"delta",
      "source_details":{
         "source_database":"{uc_catalog_name}.{bronze_schema}",
         "source_table":"customers_delta"
      },
      "reader_options":{
         
      },
      "once":false
   }
]

Q. How to add autloaders file metadata to bronze table?

DLT-META have tag source_metadata in onboarding json under source_details

"source_metadata":{
   "include_autoloader_metadata_column":"True",
   "autoloader_metadata_col_name":"source_metadata",
   "select_metadata_cols":{
      "input_file_name":"_metadata.file_name",
      "input_file_path":"_metadata.file_path"
   }
}
  • include_autoloader_metadata_column flag will add _metadata column to target bronze dataframe.
  • autoloader_metadata_col_name if this provided then will be used to rename _metadata to this value otherwise default is source_metadata
  • select_metadata_cols:{key:value} will be used to extract columns from _metadata. key is target dataframe column name and value is expression used to add column from _metadata column