Delta Operational Metrics Store (DeltaOMS) > Getting Started > Setup

Setup

Initialize the DeltaOMS Database

DeltaOMS can be configured through multiple methods :

Command line parameters - Limited to few configurations
Spark configurations - All configurations.Refer to Additional Configurations section below for full details

Command Line Parameters over-rides Spark Configurations.

For this tutorial we will use the command line parameters.More details about the other configurations can be found under Additional Configurations section.

Follow the below steps to initialize the DeltaOMS centralized Database and tables.

Import and Open the DeltaOMS Setup notebook STEP 1 into your Databricks environment

Modify the value of the variables omsBaseLocation, omsDBName, omsCheckpointSuffix, omsCheckpointBase as appropriate for your environment

Variable	Description
omsBaseLocation	Base location/path of the OMS Database on the Delta Lakehouse
omsDBName	DeltaOMS Database Name. This is the centralized database with all the Delta logs
omsCheckpointBase	DeltaOMS ingestion is a streaming process.This defines the Base path for the checkpoints
omsCheckpointSuffix	Suffix to be added to the checkpoint path (Helps in making the path unique)

Attach the DeltaOMS jar (as a library through Maven) to a running cluster
- Select Install New from the clusters Libraries tab
- In the Install Library window, select Maven and click on Search packages
- Select Maven Central from the drop-down
- Search for delta-oms and select the latest release version
- Finally, click Install to install the DeltaOMS library into the cluster
Attach the imported notebook to the cluster and start executing the cells
Execute com.databricks.labs.deltaoms.init.InitializeOMS.main method to create the OMS DB and tables.
Validate the DeltaOMS database and tables were created (Cmd 5 and Cmd 6)

Configure Delta Lakehouse objects for DeltaOMS tracking

Next, we will add few input sources (existing Delta databases or tables) to be tracked by DeltaOMS. This is done using the same notebook.

Add the names of few databases you want to track via DeltaOMS to the sourceconfig table in the DeltaOMD DB. This is done by using a simple SQL INSERT statement:

INSERT INTO <omsDBName>.sourceconfig VALUES('<Database Name>',false, Map('wildCardLevel','0'))

Refer to the Developer Guide for more details on the tables.
Configure the internal DeltaOMS configuration tables by executing com.databricks.labs.deltaoms.init.ConfigurePaths.main. This will populate the internal configuration table pathconfig with the detailed path information for all delta tables under the database(s)

Create Databricks Jobs

Next, we will create couple of databricks jobs for executing the solution. These jobs can be created manually by following the configuration options mentioned below.

Ingestion Job

The first databricks job will stream ingest the delta logs from the configured delta tables and persist in the rawactions DeltaOMS table. For example, you could name the job OMSIngestion_Job. The main configurations for the job are:

Main class : com.databricks.labs.deltaoms.ingest.StreamPopulateOMS

Example Parameters : ["--dbName=oms_test_aug31","--baseLocation=dbfs:/user/hive/warehouse/oms","--checkpointBase=dbfs:/user/hive/warehouse/oms/_checkpoints","--checkpointSuffix=_aug31_171000","--skipPathConfig","--skipInitializeOMS","--startingStream=1","--endingStream=50"]

Delta OMS Streaming Ingestion Job

Processing Job

The second job will process the raw actions and organize them into Commit Info and Action snapshots for querying and further analytics. You could name the job OMSProcessing_Job. The main configurations for the job are:

Main class : com.databricks.labs.deltaoms.process.OMSProcessRawActions

Example Parameters : ["--dbName=oms_test_aug31","--baseLocation=dbfs:/user/hive/warehouse/oms"]

Example : Delta OMS Processing Job

Example Job Creation Script:

The ingestion job can also be created through a sample script provided as part of the solution. The steps to run the sample script are:

Import and Open the DeltaOMS Setup notebook STEP 2 into your Databricks environment
Define the values for the omsBaseLocation, omsDBName, omsCheckpointSuffix, omsCheckpointBase
Modify the provided Job Creation Json template and variables as appropriate to your environment. Make sure, the correct DeltaOMS jar library (from Maven) is referred in the template
DeltaOMS creates individual streams for each tracked path and runs multiple such streams in a single Databricks job. By default, it groups 50 streams into a single databricks jobs. You could change the variable num_streams_per_job to change number of streams per job.
Once all the parameters are updated, run the command on the notebook to create the jobs. Depending on the total number of objects tracked multiple Databricks jobs could be created
You can navigate to the Jobs UI to look at the created jobs

Note: Instead of setting up two different Databricks jobs , you could also setup a single job with multiple tasks using the Multi-task Job feature.

Refer to the Developer Guide for more details on multiple stream approach for DeltaOMS ingestion and the processing job.