Skip to main content

Installation

UCX installation is covered by this section.

Installation requirements

UCX has the following installation requirements:

  • Databricks CLI v0.213 or later. See instructions.
  • Python 3.10 or later. See Windows instructions.
  • Databricks Premium or Enterprise workspace.
  • Network access to your Databricks Workspace used for the installation process.
  • Network access to the Internet for pypi.org and github.com from machine running the installation.
  • Databricks Workspace Administrator privileges for the user, that runs the installation. Running UCX as a Service Principal is not supported.
  • Account level Identity Setup. See instructions for AWS, Azure, and GCP.
  • Unity Catalog Metastore Created (per region). See instructions for AWS, Azure, and GCP.
  • If your Databricks Workspace relies on an external Hive Metastore (such as AWS Glue), make sure to read this guide.
  • A PRO or Serverless SQL Warehouse to render the report for the assessment workflow.

Once you install UCX, you can proceed to the assessment workflow to ensure the compatibility of your workspace with Unity Catalog.

Authenticate Databricks CLI

We only support installations and upgrades through Databricks CLI, as UCX requires an installation script run to make sure all the necessary and correct configurations are in place. Install Databricks CLI on macOS:

macos_install_databricks

Install Databricks CLI on Windows:

windows_install_databricks.png

Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:

databricks auth login --host WORKSPACE_HOST

To enable debug logs, simply add --debug flag to any command.

Install UCX

Install UCX via Databricks CLI:

databricks labs install ucx

You'll be prompted to select a configuration profile created by databricks auth login command.

Once you install, proceed to the assessment workflow to ensure the compatibility of your workspace with UCX.

The WorkspaceInstaller class is used to create a new configuration for Unity Catalog migration in a Databricks workspace. It guides the user through a series of prompts to gather necessary information, such as selecting an inventory database, choosing a PRO or SERVERLESS SQL warehouse, specifying a log level and number of threads, and setting up an external Hive Metastore if necessary. Upon the first installation, you're prompted for a workspace local group migration strategy. Based on user input, the class creates a new cluster policy with the specified configuration. The user can review and confirm the configuration, which is saved to the workspace and can be opened in a web browser.

The WorkspaceInstallation manages the installation and uninstallation of UCX in a workspace. It handles the configuration and exception management during the installation process. The installation process creates dashboards, databases, and jobs. It also includes the creation of a database with given configuration and the deployment of workflows with specific settings. The installation process can handle exceptions and infer errors from job runs and task runs. The workspace installation uploads wheels, creates cluster policies, and wheel runners to the workspace. It can also handle the creation of job tasks for a given task, such as job dashboard tasks, job notebook tasks, and job wheel tasks. The class handles the installation of UCX, including configuring the workspace, installing necessary libraries, and verifying the installation, making it easier for users to migrate their workspaces to UCX. At the end of the installation, the user will be prompted if the current installation needs to join an existing collection (create new collection if none present). For large organization with many workspaces, grouping workspaces into collection helps in managing UCX migration at collection level (instead of workspaces level) User should be an account admin to be able to join a collection.

After this, UCX will be installed locally and a number of assets will be deployed in the selected workspace. These assets are available under the installation folder, i.e. /Applications/ucx is the default installation folder. Please check here for more details.

macos_install_ucx

Installing a specific version

You can also install a specific version by specifying it like @vX.Y.Z:

databricks labs install ucx@vX.Y.Z

Installation resources

The following resources are installed by UCX:

Installed UCX resourcesDescription
Inventory databaseA Hive metastore database/schema in which UCX persist inventory required for the upgrade process
WorkflowsWorkflows to execute UCX
DashboardsDashboards to visualize UCX outcomes
Installation folderA workspace folder containing UCX files in /Applications/ucx/.

Installation folder

UCX is in installed in the workspace folder /Applications/ucx/. This folder contains UCX's code resources, like the source code from this GitHub repository and the dashboard. Generally, these resources are not directly used by UCX users. Resources that can be of importance to users are detailed in the subsections below.

Readme notebook

readme

Every installation creates a README notebook with a detailed description of all deployed workflows and their tasks, providing quick links to the relevant workflows and dashboards.

Debug notebook

debug

Every installation creates a DEBUG notebook, that initializes UCX as a library for you to execute interactively.

Debug logs

debug

The workflow runs store debug logs in the logs folder of the installation folder. The logs are flushed every minute in a separate file. Debug logs for the command-line interface are shown by adding the --debug flag:

databricks --debug labs ucx <command>

Installation configuration

In the installation folder, the UCX configuration is kept.

Advanced installation options

Advanced installation options are detailed below.

Force install over existing UCX

Using an environment variable UCX_FORCE_INSTALL you can force the installation of UCX over an existing installation. The values for the environment variable are 'global' and 'user'.

Global Install: When UCX is installed at /Applications/ucx User Install: When UCX is installed at /Users/<user>/.ucx

If there is an existing global installation of UCX, you can force a user installation of UCX over the existing installation by setting the environment variable UCX_FORCE_INSTALL to 'global'.

At this moment there is no global override over a user installation of UCX. As this requires migration and can break existing installations.

globaluserexpected install locationinstall_foldermode
nonodefault/Applications/ucxinstall
yesnodefault/Applications/ucxupgrade
noyesdefault/Users/X/.ucxupgrade (existing installations must not break)
yesyesdefault/Users/X/.ucxupgrade
yesnoUSER/Users/X/.ucxinstall (show prompt)
noyesGLOBAL...migrate
  • UCX_FORCE_INSTALL=user databricks labs install ucx - will force the installation to be for user only
  • UCX_FORCE_INSTALL=global databricks labs install ucx - will force the installation to be for root only

Installing UCX on all workspaces within a Databricks account

Setting the environment variable UCX_FORCE_INSTALL to 'account' will install UCX on all workspaces within a Databricks account.

  • UCX_FORCE_INSTALL=account databricks labs install ucx

After the first installation, UCX will prompt the user to confirm whether to install UCX on the remaining workspaces with the same answers. If confirmed, the remaining installations will be completed silently.

This installation mode will automatically select the following options:

  • Automatically create and enable HMS lineage init script
  • Automatically create a new SQL warehouse for UCX assessment

Installing UCX with company hosted PYPI mirror

Some enterprise block the public PYPI index and host a company controlled PYPI mirror. To install UCX while using a company hosted PYPI mirror for finding its dependencies, add all UCX dependencies to the company hosted PYPI mirror (see "dependencies" in pyproject.toml and set the environment variable PIP_INDEX_URL to the company hosted PYPI mirror URL while installing UCX:

PIP_INDEX_URL="https://url-to-company-hosted-pypi.internal" databricks labs install ucx

During installation reply yes to the question "Does the given workspace block Internet access"?

Upgrading UCX for newer versions

Verify that UCX is installed

databricks labs installed

Name Description Version
ucx Unity Catalog Migration Toolkit (UCX) <version>

Upgrade UCX via Databricks CLI:

databricks labs upgrade ucx

The prompts will be similar to Installation

macos_upgrade_ucx

Uninstall UCX

Uninstall UCX via Databricks CLI:

databricks labs uninstall ucx

Databricks CLI will confirm a few options:

  • Whether you want to remove all ucx artefacts from the workspace as well. Defaults to no.
  • Whether you want to delete the inventory database in hive_metastore. Defaults to no.

macos_uninstall_ucx