Azure

Fast Travel

Configuring Overwatch on Azure Databricks

Reach out to your Databricks representative to help you with these tasks as needed. To get started, we suggest you deploy a single workspace end to end so that you can figure out the steps involved and you can then apply these for the other workspaces to be deployed.

There are two primary sources of data that need to be configured:

  • Audit Logs
    • The audit logs contain data for every interaction within the environment and are used to track the state of various objects through time along with which accounts interacted with them. This data is relatively small and it certainly doesn’t contain any large data sets like cluster/spark logs.
    • For ingesting this data, you have the option of using System tables (RECOMMENDED) or set up the delivery through Event Hubs
  • Cluster Logs - Crucial to get the most out of Overwatch
    • Cluster logs delivery location is configured in the cluster spec –> Advanced Options –> Logging. These logs can get quite large and they are stored in a very inefficient format for query and long-term storage. This is why it’s crucial to create a dedicated storage account for these and ensure TTL (time-to-live) is enabled to minimize long-term, unnecessary costs. It’s not recommended to store these on DBFS directly (dbfs mount points are ok).
    • Best Practice - Multi-Workspace – When multiple workspaces are using Overwatch within a single region it’s best to ensure that each are going to their own prefix, even if sharing a storage account. This greatly reduces Overwatch scan times as the log files build up. If scan times get too long, the TTL can be reduced or additional storage accounts can be created to increase read IOPS throughput (rarely necessary) intra-region.

AzureClusterLogging

Reference Architecture

As of 0.7.1 Overwatch can be deployed on a single workspace and retrieve data from one or more workspaces. For more details on requirements see Multi-Workspace Consideration. There are many cases where some workspaces should be able to monitor many workspaces and others should only monitor themselves. Additionally, co-location of the output data and who should be able to access what data also comes into play, this reference architecture can accommodate all of these needs. To learn more about the details walk through the deployment steps

Using System tables Using EventHubs
AzureArchSystemTables AzureArch

Audit Log Delivery

There are two ways for Overwatch to read the Audit logs:

  • Using System Tables (see System Table Configuration Details ) or
  • Through EventHubs via Azure Diagnostic Logging. Overwatch will consume the events as a batch stream (Trigger.Once) once/period when the job runs. To configure Eventhub to deliver these logs, follow the steps below.

LIMIT If Event Hub subscription is Standard, max of 10 Hubs per Namespace. If more than 10 workspaces are deployed in the same region, plan to distribute your Event Hubs and Namespaces appropriately.

Configuring Event Hub

Step 1

Create or reuse an Event Hub namespace.

The Event Hub Namespace MUST be in the same location as your control plane

If you select the “Basic” Pricing Tier, just note that you will be required to ensure at least two successful Overwatch runs per day to ensure there won’t be data loss. This is because message retention is 1 day meaning that data expires out of EH every 24 hours. “Standard” Pricing tier doesn’t cost much more but it will give you up to 7 days retention which is much safer if you grow dependent on Overwatch reports.

Throughput Units In almost all cases, 1 static throughput unit is sufficient for Overwatch. Cases where this may not be true are include cases where there are many users across many workspaces sharing a single EH Namespace. Review the Throughput Units sizing and make the best decision for you.

Inside the namespace create an Event Hub

ALERT!!

If your Event Hub is behind a VNET, you must “Allow Trusted MSFT Services through the Firewall” or configure specific firewall rules. This should be done in the Event Hub Namespace –> Networking Tab; see screenshot below.

EH_Network

Step 2

Create an Event Hub inside your chosen (or created) EH Namespace.

Every Workspace must have its own Event Hub – They cannot be shared. Event Hub namespaces (up to 10 for Standard) can be shared but the Event Hub must be different. This is to save you time and money since Event Hubs act like topics and allow us to sub-select data without scanning / streaming it all in just to filter it out. If you try to share an Event Hub, you will be reloading data as there will be several issues.

Partitions The maximum number of cores (parallelism) that can simultaneously pull data from your event hub == EH partitions. In most cases this should be set to 32. It can be fewer, just be sure that Overwatch uses an autoscaling cluster to minimize costs while waiting to pull the data from EH. For example, if Overwatch is pulling 30 million events but only has 8 partitions, any worker cores over 8 will sit idle while all 30 million events are retrieved. In a situation like this, the minimum autoscaling compute size should approximately equal the number of partitions to minimize waste.

Overwatch needs access to the Event Hub. There are two methods for provisioning access, either a SAS Policy with keys in the connection stringOR AAD SPN. Chose your method and follow the docs below.

Step 2.1 Authorizing Access Via SAS Policy

Once created, get the connection string from the SAS Policy for the Event Hub, find the following path in the Azure portal below.

eh-namespace –> eventhub –> shared access policies –> Connection String-primary key

The connection string should begin with Endpoint=sb://. Note that the policy only needs the Listen permission

Click Add button and select Listen option for generate policies sas1 sas2 sas3 Copy the Connection string-primary key and create a secret using Key vault

Step 2.2 Authorizing Access Via AAD SPN

Navigate either to the EH Namespace or the Event Hub (whichever is appropriate for you), click on “Access Control (IAM)” and then click add –> Add Role Assignment. Choose the Role “Azure Event Hubs Data Receiver” and add the principal you would like to provision –> review and assign.

Now the Principal has access to the EH or EHNS, now you just need to capture the details of the SPN to provide to Overwatch configs. These can be found by going to the SPN Overview Active Directory –> App Registrations –> click the principal. Now capture the details in the screenshot below.

client_tenant_ids

Now a secret needs to be created (if one doesn’t already exist), so from the previous screen click “Certificates & Secrets” –> “New Client Secret”. Be sure to capture the secret as it won’t be visible later. Create a Databricks Secret and place the SPN secret in the Databricks secret. This is the value needed to use the AAD SPN to complete the rest of the AAD required configs.

The last thing you need is your connection string and that can be found by navigating back to your Event Hub ** be sure to get to the Event Hub not the Event Hub Namespace (i.e EHNS –> Event Hubs –> Event Hub). Then in the Overview section you will see “Namespace”, we will use this to construct the connection string.

Endpoint=sb://<namespace>.servicebus.windows.net EH_Namespace

Step 3

With your Event Hub Namespace and Named Event Hub created, Navigate to your Azure Databricks workspace[s] (in the portal) for which you’d like to enable Overwatch. Under Monitoring section –> Diagnostics settings –> Add diagnostic setting. Configure your log delivery similar to the example in the image below.

Additionally, ensure that the Overwatch account has sufficient privileges to read data from the Event Hub[s] created above. A common method for providing Overwatch access to the Event Hub is to simply capture the connection string and store it as a secret. Proper ways for accessing this will be in the getting started and configuration sections. There are many methods through which to authorize Overwatch, just ensure it has the access to read from the Event Hub Stream.

DO NOT leave Event hub name empty! Even though Azure doesn’t require an event hub name, you must create an event hub underneath the event hub namespace and give it a name. Reference the name here.

EH_Base_Setup

Step 4: Validate Messages Are Flowing

Now that you have configured you Overwatch EventHub Namespace, named Event Hub inside the namespace, and pointed diagnostic logging to the EventHub it’s time to validate that messages are flowing. You may have to wait several minutes to begin to see messages flowing depending on how busy the workspace is. There are two things that are commonly missed, please double-check the two bullet points below, there are images to help clarify as well.

  • A named Event Hub has been created within the Namespace.
  • Messages are flowing into the named event hub Named_EH_Visual EH_Validation

Setting up Storage Accounts

The following steps are meant to be a baseline reference for setting up the storage accounts for Overwatch targets. With regard to security, we cannot make recommendations as to what is best for your organization in a global forum like this. For specifics regarding security implications of your storage account, please contact your Databricks sales representative or talk with your Databricks / Cloud administrators.

That said, the minimum technical requirement for Overwatch to function is that the storage account exist and be able to be access (read/write for Overwatch output, read for cluster logs) the storage from the Databricks workspace. All security consideration above this basic technical requirement should be commensurate with your organizational policies/standards.

Data locality is important for cluster logs. Cluster logs can be large and plentiful and as such, sending these to a storage account in a different geographical location can have significant performance impact. Keep this in mind and note that the reference architecture recommends at least one storage account per region to minimize latency and maximize bandwidth.

OVERVIEW

Barring the security and networking sections, once you’re setup is complete, your configuration should look similar the image below. Storage Overview

Step 1

Select Storage Account from your Azure Portal and hit create

Step 2

Enter your Subscription and Resource Group in Basics Tab storage1 Locally-Redundant Storage specified here but choose which is right for your organization. Standard storage is fine, the required IOPS here are relatively low and, while premium storage will work, it’s at an added, unnecessary cost. storage2

Step 3

Recommendations

  • Security – commensurate with your organization policies but must be accessible from the Databricks Workspace.
  • Hot access tier
    • Not tested on cold and cold will suffer significant performance loss
  • Hierarchical namespace
    • While hierarchical namespace is technically option, it’s STRONGLY recommended that this be enabled.
  • Networking, similar to security, this should be configured commensurate with your corporate policies. The only technical requirement is that the data in the storage account is accessible from the Databricks workspace. storage3 storage4

Step 4

You may choose any options you prefer here. Note that soft-deletion of BLOBs cannot be mounted to the Databricks workspace and is considered an experimental feature. It’s recommended that you do NOT enable BlOB soft deletes at this time for the Overwatch storage account as it has not been fully tested. Container soft deletes are fine. storage5

Step 5

Add relevant tags and create storage6

Using the Storage Account

The overwatch output storage account may be accessed directly via:

  • abfss://
  • wasbs:// (not supported)
  • A mounted filed system within Databricks. More information on mounting storage accounts can be found here.