Default Name : sourceconfig
Table used for adding databases/paths/individual table to be tracked by DeltaOMS
Column | Type | Description |
---|---|---|
path | String | Path to the Delta object to be tracked by DeltaOMS. Could be a database (all tables will be included), directory, individual table or specific path |
skipProcessing | Boolean | Flag to exclude processing a row |
parameters | Map<String,String> | Placeholder for dynamic parameters to be passed to DeltaOMS (supports easy future expansion). Currently , the only required parameter is wildCardLevel |
Typical usage for adding input source configuration :
-- Adding a Database. OMS will discover all Delta tables in the database.
-- ConfigurePaths process could take longer depending on number of tables in the database.
INSERT INTO <OMSDBNAME>.sourceconfig values('<databaseName>', false, Map('wildCardLevel','0'))
-- Adding an entire directory. OMS will discover all Delta tables underneath the directory recursively.
-- The `**` is required after the directory name.
-- ConfigurePaths process could take longer depending on number of nested delta paths under the directory.
INSERT INTO <OMSDBNAME>.sourceconfig values('<dirName/**>', false, Map('wildCardLevel','0'))
-- Adding a table
INSERT INTO <OMSDBNAME>.sourceconfig values('<fully qualified table name>', false, Map('wildCardLevel','0'))
-- Adding an individual table path
INSERT INTO <OMSDBNAME>.sourceconfig values('<Full table path on storage>', false, Map('wildCardLevel','0'))
The wildcard level setting for a path determines how the path is modified into a wildcard path internally and used by DeltaOMS during ingestion.
Param Value | Description | Example |
---|---|---|
Map('wildCardLevel','0') |
Tracks all tables in a database path | dbfs:/user/hive/warehouse/sample.db/table1 -> dbfs:/user/hive/warehouse/sample.db/*/_delta_log/*.json |
Map('wildCardLevel','1') |
Tracks multiple database under a path | dbfs:/user/hive/warehouse/sample.db/table1 -> dbfs:/user/hive/warehouse/*/*/_delta_log/*.json |
Map('wildCardLevel','-1') |
Tracks only the individual path | dbfs:/user/hive/warehouse/sample.db/table1 -> dbfs:/user/hive/warehouse/sample.db/table1/_delta_log/*.json |
Default Name : pathconfig
Internal table that maintains the configuration for each individual Delta table path.
This is populated by executing the com.databricks.labs.deltaoms.init.ConfigurePaths.main
process
Column | Type | Description |
---|---|---|
path | String | Path to a individual Delta table to be tracked by DeltaOMS |
puid | String | Unique identifier for the table path . Used for partitioning of data inside DeltaOMS |
wildCardPath | String | Wildcard path representation from the individual table path . Used for generating optimized read streams. Behaviour controlled through the wildCardLevel parameter |
wuid | String | Unique identifier for wildcard path . Used for uniquely identifying the streams |
qualifiedName | String | Fully Qualified name of a table (if defined in the metastore). For tables not defined in metastore, defaults to the table path |
parameters | Map<String,String> | Placeholder for dynamic parameters to be passed to DeltaOMS (supports easy future expansion). Currently , the only required parameter is wildCardLevel |
Default Name : processedhistory
Internal tracking table for last version of Delta action
s processed by DeltaOMS. Populated during OMS data processing
Column | Type | Description |
---|---|---|
tableName | String | Table name tracked by DeltaOMS. Eg. - rawactions |
lastVersion | Long | Last Version of Actions processed by DeltaOMS |
update_ts | Timestamp | Last update timestamp |