Installation
UCX installation is covered by this section.
Installation requirements
UCX has the following installation requirements:
- Databricks CLI v0.213 or later. See instructions.
- Python 3.10 or later. See Windows instructions.
- Databricks Premium or Enterprise workspace.
- Network access to your Databricks Workspace used for the installation process.
- Network access to the Internet for pypi.org and github.com from machine running the installation.
- Databricks Workspace Administrator privileges for the user, that runs the installation. Running UCX as a Service Principal is not supported.
- Account level Identity Setup. See instructions for AWS, Azure, and GCP.
- Unity Catalog Metastore Created (per region). See instructions for AWS, Azure, and GCP.
- If your Databricks Workspace relies on an external Hive Metastore (such as AWS Glue), make sure to read this guide.
- A PRO or Serverless SQL Warehouse to render the report for the assessment workflow.
Once you install UCX, you can proceed to the assessment workflow to ensure the compatibility of your workspace with Unity Catalog.
Authenticate Databricks CLI
We only support installations and upgrades through Databricks CLI, as UCX requires an installation script run to make sure all the necessary and correct configurations are in place. Install Databricks CLI on macOS:
Install Databricks CLI on Windows:
Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
databricks auth login --host WORKSPACE_HOST
To enable debug logs, simply add --debug
flag to any command.
Install UCX
Install UCX via Databricks CLI:
databricks labs install ucx
You'll be prompted to select a configuration profile created by databricks auth login
command.
Once you install, proceed to the assessment workflow to ensure the compatibility of your workspace with UCX.
The WorkspaceInstaller
class is used to create a new configuration for Unity Catalog migration in a Databricks workspace.
It guides the user through a series of prompts to gather necessary information, such as selecting an inventory database, choosing
a PRO or SERVERLESS SQL warehouse, specifying a log level and number of threads, and setting up an external Hive Metastore if necessary.
Upon the first installation, you're prompted for a workspace local group migration strategy.
Based on user input, the class creates a new cluster policy with the specified configuration. The user can review and confirm the configuration,
which is saved to the workspace and can be opened in a web browser.
The WorkspaceInstallation
manages the installation and uninstallation of UCX in a workspace. It handles
the configuration and exception management during the installation process. The installation process creates dashboards, databases, and jobs.
It also includes the creation of a database with given configuration and the deployment of workflows with specific settings. The installation
process can handle exceptions and infer errors from job runs and task runs. The workspace installation uploads wheels, creates cluster policies,
and wheel runners to the workspace. It can also handle the creation of job tasks for a given task, such as job dashboard tasks, job notebook tasks,
and job wheel tasks. The class handles the installation of UCX, including configuring the workspace, installing necessary libraries, and verifying
the installation, making it easier for users to migrate their workspaces to UCX.
At the end of the installation, the user will be prompted if the current installation needs to join an existing collection (create new collection if none present).
For large organization with many workspaces, grouping workspaces into collection helps in managing UCX migration at collection level (instead of workspaces level)
User should be an account admin to be able to join a collection.
After this, UCX will be installed locally and a number of assets will be deployed in the selected workspace.
These assets are available under the installation folder, i.e. /Applications/ucx
is the default installation folder. Please check here for more details.
Installing a specific version
You can also install a specific version by specifying it like @vX.Y.Z
:
databricks labs install ucx@vX.Y.Z
Installation resources
The following resources are installed by UCX:
Installed UCX resources | Description |
---|---|
Inventory database | A Hive metastore database/schema in which UCX persist inventory required for the upgrade process |
Workflows | Workflows to execute UCX |
Dashboards | Dashboards to visualize UCX outcomes |
Installation folder | A workspace folder containing UCX files in /Applications/ucx/ . |
Installation folder
UCX is in installed in the workspace folder /Applications/ucx/
. This folder contains UCX's code resources, like the
source code from this GitHub repository and the dashboard. Generally, these resources are not
directly used by UCX users. Resources that can be of importance to users are detailed in the subsections below.
Readme notebook
Every installation creates a README
notebook with a detailed description of all deployed workflows and their tasks,
providing quick links to the relevant workflows and dashboards.
Debug notebook
Every installation creates a DEBUG
notebook, that initializes UCX as a library for you to execute interactively.
Debug logs
The workflow runs store debug logs in the logs
folder of the installation folder. The logs are flushed
every minute in a separate file. Debug logs for the command-line interface are shown
by adding the --debug
flag:
databricks --debug labs ucx <command>
Installation configuration
In the installation folder, the UCX configuration is kept.
Advanced installation options
Advanced installation options are detailed below.
Force install over existing UCX
Using an environment variable UCX_FORCE_INSTALL
you can force the installation of UCX over an existing installation.
The values for the environment variable are 'global' and 'user'.
Global Install: When UCX is installed at /Applications/ucx
User Install: When UCX is installed at /Users/<user>/.ucx
If there is an existing global installation of UCX, you can force a user installation of UCX over the existing installation by setting the environment variable UCX_FORCE_INSTALL
to 'global'.
At this moment there is no global override over a user installation of UCX. As this requires migration and can break existing installations.
global | user | expected install location | install_folder | mode |
---|---|---|---|---|
no | no | default | /Applications/ucx | install |
yes | no | default | /Applications/ucx | upgrade |
no | yes | default | /Users/X/.ucx | upgrade (existing installations must not break) |
yes | yes | default | /Users/X/.ucx | upgrade |
yes | no | USER | /Users/X/.ucx | install (show prompt) |
no | yes | GLOBAL | ... | migrate |
UCX_FORCE_INSTALL=user databricks labs install ucx
- will force the installation to be for user onlyUCX_FORCE_INSTALL=global databricks labs install ucx
- will force the installation to be for root only
Installing UCX on all workspaces within a Databricks account
Setting the environment variable UCX_FORCE_INSTALL
to 'account' will install UCX on all workspaces within a Databricks account.
UCX_FORCE_INSTALL=account databricks labs install ucx
After the first installation, UCX will prompt the user to confirm whether to install UCX on the remaining workspaces with the same answers. If confirmed, the remaining installations will be completed silently.
This installation mode will automatically select the following options:
- Automatically create and enable HMS lineage init script
- Automatically create a new SQL warehouse for UCX assessment
Installing UCX with company hosted PYPI mirror
Some enterprise block the public PYPI index and host a company controlled PYPI mirror. To install UCX while using a
company hosted PYPI mirror for finding its dependencies, add all UCX dependencies to the company hosted PYPI mirror (see
"dependencies" in pyproject.toml
and set the environment variable PIP_INDEX_URL
to the company
hosted PYPI mirror URL while installing UCX:
PIP_INDEX_URL="https://url-to-company-hosted-pypi.internal" databricks labs install ucx
During installation reply yes to the question "Does the given workspace block Internet access"?
Upgrading UCX for newer versions
Verify that UCX is installed
databricks labs installed
Name Description Version
ucx Unity Catalog Migration Toolkit (UCX) <version>
Upgrade UCX via Databricks CLI:
databricks labs upgrade ucx
The prompts will be similar to Installation
Uninstall UCX
Uninstall UCX via Databricks CLI:
databricks labs uninstall ucx
Databricks CLI will confirm a few options:
- Whether you want to remove all ucx artefacts from the workspace as well. Defaults to no.
- Whether you want to delete the inventory database in
hive_metastore
. Defaults to no.