DeltaOMS can be configured through multiple methods :
Command Line Parameters over-rides Spark Configurations.
For this tutorial we will use the command line parameters.More details about the other configurations can be found under Additional Configurations section.
Follow the below steps to initialize the DeltaOMS centralized Database and tables.
Import and Open the DeltaOMS Setup notebook STEP 1 into your Databricks environment
Modify the value of the variables omsBaseLocation, omsDBName,
omsCheckpointSuffix, omsCheckpointBase as appropriate for your environment
| Variable | Description |
|---|---|
| omsBaseLocation | Base location/path of the OMS Database on the Delta Lakehouse |
| omsDBName | DeltaOMS Database Name. This is the centralized database with all the Delta logs |
| omsCheckpointBase | DeltaOMS ingestion is a streaming process.This defines the Base path for the checkpoints |
| omsCheckpointSuffix | Suffix to be added to the checkpoint path (Helps in making the path unique) |
Attach the DeltaOMS jar (as a library through Maven) to a running cluster
Install New from the clusters Libraries tab

Install Library window, select Maven and click on Search packages

Maven Central from the drop-down

delta-oms and select the latest release version

Install to install the DeltaOMS library into the clusterAttach the imported notebook to the cluster and start executing the cells
Execute com.databricks.labs.deltaoms.init.InitializeOMS.main method to create the OMS DB and tables.
Validate the DeltaOMS database and tables were created (Cmd 5 and Cmd 6)
Next, we will add few input sources (existing Delta databases or tables) to be tracked by DeltaOMS. This is done using the same notebook.
Add the names of few databases you want to track via DeltaOMS to the sourceconfig table in the DeltaOMD DB.
This is done by using a simple SQL INSERT statement:
INSERT INTO <omsDBName>.sourceconfig VALUES('<Database Name>',false, Map('wildCardLevel','0'))
Refer to the Developer Guide for more details on the tables.
Configure the internal DeltaOMS configuration tables by executing
com.databricks.labs.deltaoms.init.ConfigurePaths.main.
This will populate the internal configuration table pathconfig with the detailed path
information for all delta tables under the database(s)
Next, we will create couple of databricks jobs for executing the solution. These jobs can be created manually by following the configuration options mentioned below.
The first databricks job will stream ingest the delta logs from the configured delta tables and persist in the rawactions DeltaOMS table.
For example, you could name the job OMSIngestion_Job. The main configurations for the job are:
Main class : com.databricks.labs.deltaoms.ingest.StreamPopulateOMS
Example Parameters : ["--dbName=oms_test_aug31","--baseLocation=dbfs:/user/hive/warehouse/oms","--checkpointBase=dbfs:/user/hive/warehouse/oms/_checkpoints","--checkpointSuffix=_aug31_171000","--skipPathConfig","--skipInitializeOMS","--startingStream=1","--endingStream=50"]

The second job will process the raw actions and organize them into Commit Info and Action snapshots for querying and further analytics.
You could name the job OMSProcessing_Job. The main configurations for the job are:
Main class : com.databricks.labs.deltaoms.process.OMSProcessRawActions
Example Parameters : ["--dbName=oms_test_aug31","--baseLocation=dbfs:/user/hive/warehouse/oms"]
Example :

The ingestion job can also be created through a sample script provided as part of the solution. The steps to run the sample script are:
omsBaseLocation, omsDBName, omsCheckpointSuffix, omsCheckpointBasenum_streams_per_job to change number of streams per job.Jobs UI to look at the created jobsNote: Instead of setting up two different Databricks jobs , you could also setup a single job with multiple tasks using the Multi-task Job feature.
Refer to the Developer Guide for more details on multiple stream approach for DeltaOMS ingestion and the processing job.