Welcome To OverWatch > Deploy Overwatch > Running Overwatch > Cluster Configuration

Cluster Configuration

Cluster Requirements

DBR 11.3LTS as of 0.7.1.0
- Overwatch will likely run on different versions of DBR but is built and tested on 11.3LTS since 0.7.1
- Overwatch < 0.7.1 – DBR 10.4LTS
- Overwatch < 0.6.1 – DBR 9.1LTS
Using Photon
- As of 0.7.1.0 Photon is recommended so long as the Overwatch cluster is using DBR 11.3LTS+. Photon does increase the DBU spend but the performance boost often results in the code running significantly more efficiently netting out a benefit. Mileage can vary between customers so if you really want to know which is most efficient, feel free to run on both and use Overwatch to determine which is best for you.
- Prior to 0.7.1.0 and DBR 11.3LTS Photon was untested
Disable Autoscaling - See Notes On Autoscaling
- External optimize cluster recommendations are different. See External Optimize for more details
Add the relevant dependencies

Add the following dependencies to your cluster

Overwatch Assembly (fat jar): com.databricks.labs:overwatch_2.12:<latest>
(Azure Only, if not using System Tables) azure-eventhubs-spark - integration with Azure EventHubs
- Maven Coordinate: com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.21
(Azure Only - With AAD Auth For EH, if not using system tables) msal4j - library to support AAD Authorization
- Maven Coordinate: com.microsoft.azure:msal4j:1.10.1
If maven isn’t accessible in your environment you have two options:
1. On Github, go to the Releases page, find the version you’re interested in, and scroll down to the Assets section, there you can download the jar or uber jar, as needed.
2. Compile an uber jar with all dependencies. To do this, download the uber_pom.xml (Ensure the Overwatch version specified in the pom file is the version you want, there are 2 places to check) and run mvn clean package from within the same directory as the pom.xml
  - This method does require you have maven installed and configured correctly

Auto-scaling compute – Not Recommended
- Note that autoscaling compute will not be extremely efficient due to some of the compute tails as a result of log file size skew and storage mediums. Additionally, some modules require thousands of API calls (for historical loads) and have to be throttled to protect the workspace.
Auto-scaling Local Storage – Strongly Recommended for historical loads
- Some of the data sources can grow to be quite large and require very large shuffle stages which requires sufficient local disk. If you choose not to use auto-scaling storage be sure you provision sufficient local disk space.
  - SSD or NVME (preferred) – It’s strongly recommended to use fast local disks as there can be very large shuffles