DAB Demo

Overview

This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta:

This demo will perform following steps:

Create dlt-meta schema’s for dataflowspec and bronze/silver layer
Upload necessary resources to unity catalog volume
Create DAB files with catalog, schema, file locations populated
Deploy DAB to databricks workspace
Run onboarding using DAB commands
Run Bronze/Silver Pipelines using DAB commands
Demo examples will showcase fan-out pattern in silver layer
Demo example will show case custom transformations for bronze/silver layers
Adding custom columns and metadata to Bronze tables
Implementing SCD Type 1 to Silver tables
Applying expectations to filter data in Silver tables

Steps:

Launch Command Prompt
Install Databricks CLI
- Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
```
databricks auth login --host WORKSPACE_HOST
```

Install Python package requirements:

# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5

Clone dlt-meta:

git clone https://github.com/databrickslabs/dlt-meta.git

Navigate to project directory:
```
cd dlt-meta
```

Set python environment variable into terminal:

dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home

Generate DAB resources and set up schemas: This command will:
- Generate DAB configuration files
- Create DLT-Meta schemas
- Upload necessary files to volumes
```
python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile>
```
Note: If you don’t specify --profile, you’ll be prompted for your Databricks workspace URL and access token.

Deploy and run the DAB bundle:

Navigate to the DAB directory:

cd demo/dabs

Validate the bundle configuration:

databricks bundle validate --profile=<your_profile>

Deploy the bundle to dev environment:

databricks bundle deploy --target dev --profile=<your_profile>

Run the onboarding job:

databricks bundle run onboard_people -t dev --profile=<your_profile>

Execute the pipelines:

databricks bundle run execute_pipelines_people -t dev --profile=<your_profile>