Notebooks can either be run manually or scheduled to run as a job. While the notebook can be scheduled as a job, it’s strongly recommended that Overwatch be run as a JAR instead of a notebook. Notebook execution is great for rapid testing and validation.
This deployment method requires Overwatch Version 0.7.1.0+
Below is an example deployment. When you’re ready to get started simply download the rapid start linked notebook below.
import com.databricks.labs.overwatch.MultiWorkspaceDeployment
val configCsvPath = "dbfs:/FileStore/overwatch/workspaceConfig.csv" // Path to the config.csv
// temp location which will be used as a temp storage. It will be automatically cleaned after each run.
val tempLocation = "/tmp/overwatch/templocation"
// number of workspaces to validate in parallel. Exceeding 20 may require larger drivers or additional cluster config considerations
// If total workspaces <= 20 recommend setting parallelism == to workspace count
val parallelism = 4
// Run validation
MultiWorkspaceDeployment(configCsvPath, tempLocation)
.deploy(parallelism)
// To run only specific pipelines (i.e. Bronze / Silver Gold) as second argument can be passed to deploy as a
// comma-delimited list of pipelines to run
// ex: MultiWorkspaceDeployment(configCsvPath, tempLocation).deploy(parallelism,"Bronze")
// MultiWorkspaceDeployment(configCsvPath, tempLocation).deploy(parallelism,"Silver")
// MultiWorkspaceDeployment(configCsvPath, tempLocation).deploy(parallelism,"Gold")
// MultiWorkspaceDeployment(configCsvPath, tempLocation).deploy(parallelism,"Bronze,Silver")
// MultiWorkspaceDeployment(configCsvPath, tempLocation).deploy(parallelism,"Silver,Gold")
Once the deployment is completed, all the deployment update details will be stored in a deployment report. It will be generated in <etl_storage_prefix>/report/deploymentReport as delta table. Run the below query to check the deployment report.
NOTE The deployment report maintains a full history of deployments. If you’ve deployed multiple times be sure to look at the snapTS and only review the validations relevant for the latest deployment.
select * from delta.`<etl_storage_prefix>/report/deploymentReport`
order by snapTS desc
display(
spark.read.format("delta").load("<etl_storage_prefix>/report/deploymentReport")
.orderBy('snapTS.desc)
)