Overwatch utilizes several APIs to normalize the platform data. Overwatch leverages secret scopes and keys to acquire a token that is authorized to access the platform. The account that owns the token (i.e. dapi token) must have read access to the assets you wish to manage. If the token owner is a non-admin account the account must be granted read level access to the assets to be monitored.
Scope | Access Required |
---|---|
clusters | Can Attach To – all clusters |
clusterEvents | Can Attach To – all clusters |
pools | Read – all Pools |
jobs | Read – all Jobs |
dbsql | Read – all Warehouses |
accounts | Token MUST BE ADMIN |
If the scope is not referenced above it does not require API access
The owner of the Overwatch Job must have read access to the secret scopes used to store any/all credentials used by the Overwatch pipeline.
For example, if John runs a notebook, John must have access to the secrets used in the config. The same is true if John creates a scheduled job and sets it to run on a schedule. This changes when it goes to production though and John sends it through production promotion process and the job is now owned by the etl-admin principle. Since the job is now owned by the etl-admin principle, etl-admin must have access to the relevant secrets.
In addition to API authentication / authorization there are some storage access that Overwatch requires.
Overwatch cluster must be able to read the audit logs
Overwatch cluster must be able to read/write from/to the output location (i.e. storage prefix)
Overwatch, by default, will create a single database that, is accessible to everyone in your organization unless you specify a location for the database in the configuration that is secured. Several of the modules capture fairly sensitive data such as users, userIDs, etc. It is suggested that the configuration specify two databases in the configuration:
For more information on how to configure the separation of ETL and consumption databases, please reference the configuration page.
Additional steps can be taken to secure the storage location of the ETL entities as necessary. The method for securing access to these tables would be the same as with any set of tables in your organization.
In Azure the audit logs must be acquired through an Event Hub stream. The details for configuring and provisioning access are detailed in the Azure Cloud Infrastructure Section
Fill out the following and add it to your cluster as a spark config. For more information please reference the Azure DOCS
Note that there are two sets of configs, both are required as both spark on the executors and the driver must be authorized to the storage.
fs.azure.account.auth.type OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id <application-id>
fs.azure.account.oauth2.client.secret {{secrets/<SCOPE_NAME>/<KEY_NAME>}}
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-id>/oauth2/token
spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id <application-id>
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/<SCOPE_NAME>/<KEY_NAME>}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-id>/oauth2/token
To provide access to the views in DBSQL follow the steps below. Note, you will need an account admin to complete these steps. More details on these steps can be found in the Databricks documentation. These steps are necessary since the Overwatch tables are stored as external tables so direct access to the storage for the users must be provisioned through DBSQL.