Reach out to your Databricks representative to help you with these tasks as needed. To get started, we suggest you deploy a single workspace end to end so that you can figure out the steps involved and you can then apply these for the other workspaces to be deployed.
There are two primary sources of data that need to be configured:
As of 0.7.1 Overwatch can be deployed on a single workspace and retrieve data from one or more workspaces. For more details on requirements see Multi-Workspace Consideration. There are many cases where some workspaces should be able to monitor many workspaces and others should only monitor themselves. Additionally, co-location of the output data and who should be able to access what data also comes into play, this reference architecture can accommodate all of these needs. To learn more about the details walk through the deployment steps
Using System tables | Using EventHubs |
---|---|
There are two ways for Overwatch to read the Audit logs:
LIMIT If Event Hub subscription is Standard, max of 10 Hubs per Namespace. If more than 10 workspaces are deployed in the same region, plan to distribute your Event Hubs and Namespaces appropriately.
Create or reuse an Event Hub namespace.
The Event Hub Namespace MUST be in the same location as your control plane
If you select the “Basic” Pricing Tier, just note that you will be required to ensure at least two successful Overwatch runs per day to ensure there won’t be data loss. This is because message retention is 1 day meaning that data expires out of EH every 24 hours. “Standard” Pricing tier doesn’t cost much more but it will give you up to 7 days retention which is much safer if you grow dependent on Overwatch reports.
Throughput Units In almost all cases, 1 static throughput unit is sufficient for Overwatch. Cases where this may not be true are include cases where there are many users across many workspaces sharing a single EH Namespace. Review the Throughput Units sizing and make the best decision for you.
Inside the namespace create an Event Hub
ALERT!!
If your Event Hub is behind a VNET, you must “Allow Trusted MSFT Services through the Firewall” or configure specific firewall rules. This should be done in the Event Hub Namespace –> Networking Tab; see screenshot below.
Create an Event Hub inside your chosen (or created) EH Namespace.
Every Workspace must have its own Event Hub – They cannot be shared. Event Hub namespaces (up to 10 for Standard) can be shared but the Event Hub must be different. This is to save you time and money since Event Hubs act like topics and allow us to sub-select data without scanning / streaming it all in just to filter it out. If you try to share an Event Hub, you will be reloading data as there will be several issues.
Partitions The maximum number of cores (parallelism) that can simultaneously pull data from your event hub == EH partitions. In most cases this should be set to 32. It can be fewer, just be sure that Overwatch uses an autoscaling cluster to minimize costs while waiting to pull the data from EH. For example, if Overwatch is pulling 30 million events but only has 8 partitions, any worker cores over 8 will sit idle while all 30 million events are retrieved. In a situation like this, the minimum autoscaling compute size should approximately equal the number of partitions to minimize waste.
Overwatch needs access to the Event Hub. There are two methods for provisioning access, either a SAS Policy with keys in the connection stringOR AAD SPN. Chose your method and follow the docs below.
Once created, get the connection string from the SAS Policy for the Event Hub, find the following path in the Azure portal below.
eh-namespace –> eventhub –> shared access policies –> Connection String-primary key
The connection string should begin with Endpoint=sb://
. Note that the policy only needs the Listen permission
Click Add button and select Listen option for generate policies Copy the Connection string-primary key and create a secret using Key vault
Navigate either to the EH Namespace or the Event Hub (whichever is appropriate for you), click on “Access Control (IAM)” and then click add –> Add Role Assignment. Choose the Role “Azure Event Hubs Data Receiver” and add the principal you would like to provision –> review and assign.
Now the Principal has access to the EH or EHNS, now you just need to capture the details of the SPN to provide to Overwatch configs. These can be found by going to the SPN Overview Active Directory –> App Registrations –> click the principal. Now capture the details in the screenshot below.
Now a secret needs to be created (if one doesn’t already exist), so from the previous screen click “Certificates & Secrets” –> “New Client Secret”. Be sure to capture the secret as it won’t be visible later. Create a Databricks Secret and place the SPN secret in the Databricks secret. This is the value needed to use the AAD SPN to complete the rest of the AAD required configs.
The last thing you need is your connection string and that can be found by navigating back to your Event Hub ** be sure to get to the Event Hub not the Event Hub Namespace (i.e EHNS –> Event Hubs –> Event Hub). Then in the Overview section you will see “Namespace”, we will use this to construct the connection string.
Endpoint=sb://<namespace>.servicebus.windows.net
With your Event Hub Namespace and Named Event Hub created, Navigate to your Azure Databricks workspace[s] (in the portal) for which you’d like to enable Overwatch. Under Monitoring section –> Diagnostics settings –> Add diagnostic setting. Configure your log delivery similar to the example in the image below.
Additionally, ensure that the Overwatch account has sufficient privileges to read data from the Event Hub[s] created above. A common method for providing Overwatch access to the Event Hub is to simply capture the connection string and store it as a secret. Proper ways for accessing this will be in the getting started and configuration sections. There are many methods through which to authorize Overwatch, just ensure it has the access to read from the Event Hub Stream.
DO NOT leave Event hub name empty! Even though Azure doesn’t require an event hub name, you must create an event hub underneath the event hub namespace and give it a name. Reference the name here.
Now that you have configured you Overwatch EventHub Namespace, named Event Hub inside the namespace, and pointed diagnostic logging to the EventHub it’s time to validate that messages are flowing. You may have to wait several minutes to begin to see messages flowing depending on how busy the workspace is. There are two things that are commonly missed, please double-check the two bullet points below, there are images to help clarify as well.
The following steps are meant to be a baseline reference for setting up the storage accounts for Overwatch targets. With regard to security, we cannot make recommendations as to what is best for your organization in a global forum like this. For specifics regarding security implications of your storage account, please contact your Databricks sales representative or talk with your Databricks / Cloud administrators.
That said, the minimum technical requirement for Overwatch to function is that the storage account exist and be able to be access (read/write for Overwatch output, read for cluster logs) the storage from the Databricks workspace. All security consideration above this basic technical requirement should be commensurate with your organizational policies/standards.
Data locality is important for cluster logs. Cluster logs can be large and plentiful and as such, sending these to a storage account in a different geographical location can have significant performance impact. Keep this in mind and note that the reference architecture recommends at least one storage account per region to minimize latency and maximize bandwidth.
Barring the security and networking sections, once you’re setup is complete, your configuration should look similar the image below.
Select Storage Account from your Azure Portal and hit create
Enter your Subscription and Resource Group in Basics Tab Locally-Redundant Storage specified here but choose which is right for your organization. Standard storage is fine, the required IOPS here are relatively low and, while premium storage will work, it’s at an added, unnecessary cost.
Recommendations
You may choose any options you prefer here. Note that soft-deletion of BLOBs cannot be mounted to the Databricks workspace and is considered an experimental feature. It’s recommended that you do NOT enable BlOB soft deletes at this time for the Overwatch storage account as it has not been fully tested. Container soft deletes are fine.
Add relevant tags and create
The overwatch output storage account may be accessed directly via: