Many validations are performed to minimize mistakes. The following section offers details on all validations done. Not all validation steps are done on all runs. All validations should be performed before a first run on any given workspace. For more information on executing a full validation, see below
Name | Validation Rule | Impacted columns |
---|---|---|
Api Url Validation | API URL should give some response with provided scope and key. | api_url |
Primordial data validation | Primordial Date must be in yyyy-MM-dd format (i.e. 2022-01-30 == Jan 30, 2022) and must be less than current date. | primordial_date |
Excluded scope validation | Excluded scope must be null or in colon-delimited format and include only the following audit:sparkEvents:jobs:clusters:clusterEvents:notebooks:pools:accounts:dbsql | excluded_scopes |
Consuming data from event hub(Azure) | Audit Log data must be recognized as present and readable from the provided Event Hub Configuration fields. | eh_name,eh_scope_key,secret_scope |
Common consumer database name | All workspaces must have a common consumer database name. | consumer_database_name |
Common ETL database name | All workspaces must have a common ETL database name. | etl_database_name |
ETL storage prefix validation | All workspaces must share a single ETL storage prefix and the Overwatch cluster must have appropriate access to read/write. | etl_storage_prefix |
Cloud provider validation | Either Azure or AWS. | cloud |
Max days validation | Must Be a Number | max_days |
Secrete scope validation | Secret scope must not be empty. | secret_scope |
PAT key validation | DBPAT must not be empty and must be able to authenticate to the workspace. | secret_key_dbpat |
Audit log location validation (AWS ONLY) | Audit logs must present immediately within the provided path and must be read accessible | auditlogprefix_source_aws |
Mount point validation *More Details | Workspaces with more than 50 mount points are not supported in remote deployments at this time. |
import com.databricks.labs.overwatch.MultiWorkspaceDeployment
val configCsvPath = "dbfs:/FileStore/overwatch/workspaceConfig.csv" // Path to the config.csv
// temp location which will be used as a temp storage. It will be automatically cleaned after each run.
val tempLocation = "/tmp/overwatch/templocation"
// number of workspaces to validate in parallel. Exceeding 20 may require larger drivers or additional cluster config considerations
// If total workspaces <= 20 recommend setting parallelism == to workspace count
val parallelism = 4
// Run validation
MultiWorkspaceDeployment(configCsvPath, tempLocation)
.validate(parallelism)
The validation report will be generated in <etl_storage_prefix>/report/validationReport as delta table. Run the below query to check the validation report. All records should say validated = true or action is necessary prior to deployment.
NOTE The validation report maintains a full history of validations. If you’ve validated multiple times be sure to look at the snapTS and only review the validations relevant for the latest validation run.
-- full report
select * from delta.`<etl_storage_prefix>/report/validationReport`
order by snapTS desc
-- only validation error
select * from delta.`<etl_storage_prefix>/report/validationReport` where validated = 'false'
order by snapTS desc
display(
spark.read.format("delta").load("<etl_storage_prefix>/report/validationReport")
.orderBy('snapTS.desc)
)
As you can see from the validation, monitoring remote workspaces is not supported for workspaces with > 50 mount points. This is because the Databricks API that provides a map of mount points to fully-qualified URIs returns only 50 results and does not support pagination. Databricks is aware of this limitation and is working to prioritize pagination. In the meantime the dev team is looking for work-arounds and will publish updates as soon as a viable alternative can be found.
If Using AWS And All Cluster Logs are going to s3:// direct paths
Currently, workspaces with more than 50 mount points must have a local Overwatch deployment due to this limitation. Deactivate / Remove the workspace in question from the config and add the workspace to a new config where it is the only configured workspace. Create a local Overwatch deployment for that one deployment local to that workspace.