Q. How do I get started ?
Please refer to the Getting Started guide
Q. How do I add databases to be monitored by DeltaOMS ?
You can add a database name to the DeltaOMS configuration table (by default called sourceconfig
)
using simple SQL INSERT
statement.
Example:
INSERT INTO <omsDBName>.sourceconfig VALUES('<Database Name>',false, Map('wildCardLevel','0'))
For more details on the configurations and parameters, refer to Getting Started and Developer Guide
Q. What components will to be deployed for DeltaOMS ?
DeltaOMS deploys two primary components. One for ingestion of the delta logs from the configured table. This is a streaming component and can be run either on a schedule or as an always running streaming job depending on your SLA requirements for retrieving operational metrics.
The other component is a batch component for processing and enriching of the delta actions into different OMS specific tables.
Q. How many ingestion jobs we need to run ?
DeltaOMS supports 50 streams by default for each Databricks jobs. These values are configurable
through command line parameters --startingStream
and --endingStream
, default 1 and 50 respectively.
We recommend setting up your jobs to support groups of 40-50 stream / wildcard paths. For example, you have 75 unique wildcard paths to process, we recommend creating 2 Databricks jobs for DeltaOMS ingestion. The sample notebook provides examples of how to automatically create jobs based on your input sources.
Q. What is the process flow for adding new input sources for DeltaOMS tracking ? Assuming you already have DeltaOMS running on your environment,new input sources can be added by:
sourceconfig
table. Example usage in Developer Guidecom.databricks.labs.deltaoms.init.ConfigurePaths
Q. (Advanced) Is there an option to add all Delta tables under a path using wildcard expressions to be tracked by DeltaOMS ?
You could use a special wildcard expression [PATH/**
] to the sourceconfig
table and add all Delta tables under a path for DeltaOMS tracking.
An example syntax to add such a path is :
INSERT INTO <omsDBName>.sourceconfig VALUES('dbfs:/user/warehouse/**',false, Map('wildCardLevel','0'))
DeltaOMS will discover all delta tables under the path 'dbfs:/user/warehouse/
and add it to the internal pathconfig
table for tracking.
Q. (Advanced) How do I add arbitrary wildcard paths to be tracked by DeltaOMS ?
We recommend using the sourceconfig
configuration table to set up input source tracking for DeltaOMS
(Refer to Developer Guide
There could be instances where some special wildcard path (which does not fall under the wildCardLevel
provided by DeltaOMS) needs to be tracked by DeltaOMS. These random wildcards can be configured directly
on the pathconfig
configuration table.
Example: Say, you need to configure paths like
dbfs:/databricks-datasets/*/*/*/*/*/_delta_log/*.json
,dbfs:/databricks-datasets/*/*/*/*/_delta_log/*.json
and dbfs:/databricks-datasets/*/*/_delta_log/*.json
. You could directly add them to the pathconfig table
using the SQL statements
spark.sql(s"""INSERT INTO <omsDBName>.pathconfig VALUES (
'dbfs:/databricks-datasets/4/',
substring(sha1('dbfs:/databricks-datasets/4/'), 0, 7),
'dbfs:/databricks-datasets/*/*/*/*/*/_delta_log/*.json',
substring(sha1('dbfs:/databricks-datasets/*/*/*/*/*/_delta_log/*.json'), 0, 7),
Map('wildCardLevel','1'),false,'databricks-datasets-4',0,false,'2021-07-23T19:06:04.933+0000')
""")
Once these are added , the operational metrics from tables under these wildcard location can be captured and processed using the regular OMS jobs.