Below are the requirement needed for Storage Access setup in GCP
Unity Catalog Storage credentials should have the ability to read and write to a External Location(GCS bucket) by assigning appropriate IAM roles on that bucket to a Databricks-generated Google Cloud service account. Please refer the docs for detailed steps.
After the Storage Credentials are created, the existing Service Account attached to the Overwatch Job cluster needs to be provisioned read/write access to the storage target for the Overwatch Output (which will ultimately become your external location). This Service Account can be added to Overwatch Job/Interactive Clusters.
The following steps needs to be performed in external location’s GCS bucket permissions -
Permissions
tab of the external location’s GCS bucketGrant Access
Add Principal
sectionWhen you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage GCS buckets in your GCP project:
databricks-<workspace-id>
- stores system data that is generated as you use various Databricks features.
This bucket includes notebook revisions, job run details, command results, and Spark logsdatabricks-<workspace-id>-system
- contains workspace’s root storage for the Databricks File System (DBFS).
Your DBFS root bucket is not intended for storage of production data.Follow the databricks-docs to get more information on these buckets.
In order to fetch the cluster logs of the remote workspace, cluster should have access to the GCS bucket -
databricks-<workspace-id>
. This GCS Bucket is created in the Google Cloud project that hosts your Databricks workspace.
The following steps needs to be performed in databricks-<workspace-id>
GCS bucket permissions -
Permissions
tab of the databricks-<workspace-id>
GCS bucketGrant Access
Add Principal
sectionGCP – Remote Cluster Logs - Databricks on GCP, does not support mounted/GCS bucket locations. Customers must provide DBFS root path as a target for log delivery.