DeltaOMS uses multiple Spark configurations to control its different components.
DeltaOMS Spark configuration (spark.conf) details :
Configuration Key | Description | Required | Example | Default Value | Applies to components |
---|---|---|---|---|---|
spark.databricks.labs.deltaoms.location.url | Base location/path of the OMS catalog and schema on the Delta Lake. This is created as an EXTERNAL LOCATION on Unity Catalog (UC) | Y | s3:/delta-monitoring/deltaoms | None | All |
spark.databricks.labs.deltaoms.location.name | Name of the UC EXTERNAL LOCATION for the OMS catalog and schema on the Delta Lake | Y | deltaoms-external-location | None | All |
spark.databricks.labs.deltaoms.storage.credential.name | Storage credential name for the UC EXTERNAL LOCATION created for DeltaOMS. This is usually provided by your admin | Y | deltaoms-storage-credential | None | All |
spark.databricks.labs.deltaoms.catalog.name | OMS Catalog Name. This is the UC Catalog where all the DeltaOMS tables will be created | Y | oms.db | None | All |
spark.databricks.labs.deltaoms.schema.name | OMS Schema Name. This is the database where all the Delta log details will be collected in tables | Y | oms.db | None | All |
spark.databricks.labs.deltaoms.checkpoint.base | Base path for the checkpoints for OMS streaming pipeline for collecting the Delta logs for the configured tables | Y | s3:/delta-monitoring/_oms_checkpoints/ | None | Ingestion |
spark.databricks.labs.deltaoms.checkpoint.suffix | Suffix to be added to the checkpoint path. Useful during testing for starting off a fresh process | Y | _1234 | None | Ingestion |
spark.databricks.labs.deltaoms.raw.action.table | OMS table name for storing the raw delta logs collected from the configured tables | N | oms_raw_actions | rawactions | Initialization |
spark.databricks.labs.deltaoms.source.config.table | Configuration table name for setting the list of Delta Path, databases and/or tables for which the delta logs should be collected by OMS | N | oms_source_config | sourceconfig | Initialization |
spark.databricks.labs.deltaoms.path.config.table | Configuration table name for storing Delta path details and few related metadata for internal processing purposes by OMS | N | oms_path_config | pathconfig | Initialization |
spark.databricks.labs.deltaoms.processed.history.table | Configuration table name for storing processing details for OMS ETL Pipelines. Used internally by OMS | N | oms_processed_history | processedhistory | Initialization |
spark.databricks.labs.deltaoms.commitinfo.snapshot.table | Table name for storing the Delta Commit Information generated from the processed raw Delta logs for configured tables/paths | N | oms_commitinfo_snapshots | commitinfosnapshots | Initialization |
spark.databricks.labs.deltaoms.action.snapshot.table | Table name for storing the Delta Actions information snapshots. Generated from processing the Raw Delta logs | N | oms_action_snapshots | actionsnapshots | Initialization |
spark.databricks.labs.deltaoms.consolidate.wildcard.paths | Flag to enable/disable processing Delta logs using consolidated wildcard patterns extracted from the path configured for OMS | N | false | true | Ingestion |
spark.databricks.labs.deltaoms.truncate.path.config | Truncate the internal Path Config table | N | false | false | Configuration |
spark.databricks.labs.deltaoms.trigger.interval | Trigger interval for processing the Delta logs from the configured tables/paths | N | 30s | AvailableNow | Ingestion |
spark.databricks.labs.deltaoms.trigger.max.files | Maximum number of Delta log files to process for each Trigger interval | N | 2048 | 1024 | Ingestion |
spark.databricks.labs.deltaoms.starting.stream | Starting stream number for the Ingestion Job | N | 10 | 1 | Ingestion |
spark.databricks.labs.deltaoms.ending.stream | Ending stream number for the Ingestion Job | N | 30 | 50 | Ingestion |
spark.databricks.labs.deltaoms.use.autoloader | Use Autoloader for the Ingestion Job | N | false | true | Ingestion |