Skip to main content

Migration progress

UCX tracks migration progress of business resources: workspace objects that contribute to business value. (The term "business resource" comes from the UCX team and is not Databricks terminology.) We identified the following business resources:

Business resourceMotivation
DashboardDashboards visualize data models supporting business processes
JobJobs create data models supporting business process - not exclusively data models used by dashboards
Delta Live Table pipelinesDelta Live Table pipelines create data models supporting business process - not exclusively data models used by dashboards

Furthermore, UCX tracks migration of the following Hive and workspace objects:

Hive or workspace object
Tables and view (Hive data object)
Grant
User defined function (UDF)
Cluster
Cluster policies

See the resource index for more details on the above objects.

Usage

Use the migration progress through the migration progress dashboard after running the (experimental) migration progress workflow.

Failures

A key historical attribute in migration progress are the failures that show the incompatibility issues with Unity Catalog. By resolving the failures for an object, UCX flags that object to be Unity Compatible. Thus, for Hive data objects, this means that the objects are migrated to Unity Catalog.

Owner

Another key historical attribute in migration progress is the owner that shows who owns the object, thus who is key for making the object Unity Catalog compatible. The ownership is a best effort basis; a concept made more central in Unity Catalog.

Tracking

The (experimental) migration progress workflow tracks the migration progress and populates migration progress tables.

Roll-up to business resources

The migration process' main intent is to track if business resources are migrated to Unity Catalog. UCX rolls up the failures of dependent resources to the business resources so that the business resources show the failures of the dependent resources.

Business resourceDependent resources
DashboardQueries
JobCluster, cluster policies, cluster configurations, code resources
Delta Live Table pipelinesTBD

Similarly, a roll-up for the failures of the Hive and workspace object are done:

Hive or workspace objectDependent resources
Tables and view (Hive data object)Grants, TableMigrationStatus
Grant
User defined function (UDF)
ClusterCluster policies
Cluster policies

Dangling Hive or workspace objects

Hive or workspace objects that are not referenced by business resources are considered to be dangling objects. For now, these objects are tracked separately, thus not rolled up to business resources.

Persistence

The progress is persisted in the UCX UC catalog so that migration progress can be tracked cross-workspace. The catalog contains the tables below.

Historical

The historical table contains historical records of inventory objects relevant to the migration progress

ColumnData typeDescription
workspace_idintegerThe identifier of the workspace where this historical record was generated.
job_run_idintegerThe identifier of the job run that generated this historical record.
object_typestringThe inventory table for which this historical record was generated.
object_idlist[string]The type-specific identifier for the corresponding inventory record.
datamapping[string, string]Type-specific JSON-encoded data of the inventory record.
failureslist[string]The list of problems associated with the object that this inventory record covers.
ownerstringThe identity that has ownership of the object.
ucx_versionstringThe UCX semantic version.

Example historical record:

workspace_idjob_run_idobject_typeobject_iddatafailuresownerucx_version
1234567891'Table'['hive_metastore', 'schema', 'table']{'database': 'schem', 'name': 'table', 'catalog': 'hive_metastore', 'object_type': 'MANAGED', 'table_format': 'DELTA', 'is_partitioned': 'false'}['Used by NOTEBOOK: test/test.py']'cor.zuurmond@databricks.com''0.50.0'

Workflow run

The auxiliary workflow_runs table tracks UCX workflow runs.

ColumnData typeDescription
started_atdt.datetimeThe timestamp of the workflow run start
finished_atdt.datetimeThe timestamp of the workflow run end
workspace_idintThe workspace id in which the workflow ran
workflow_namestrThe workflow name that ran
workflow_idintThe workflow id of the workflow that ran
workflow_run_idintThe workflow run id
workflow_run_attemptintThe workflow run attempt

Example workflow run record:

started_atfinished_atworkspace_idworkflow_nameworkflow_idworkflow_run_idworkflow_run_attempt
datetime.datetime(2024, 11, 22, 13, 50, 37, tzinfo=datetime.timezone.utc)datetime.datetime(2024, 11, 22, 13, 50, 58, tzinfo=datetime.timezone.utc)123456789'Migration progress (experimental)'1234560

Resource index

Hive or workspace objectDescriptionDependent resources
Redash dashboardThe Redash dashboardQueries
Lakeview dashboardThe Lakeview dashboardQueries
DashboardThe Redash or lakeview dashboardQueries
JobJobs create data models supporting business process - not exclusively data models used by dashboardsTasks, Cluster
Job taskJob tasks, defined as part of the job definitionCode
Delta Live Table pipelinesDelta Live Table pipelines create data models supporting business process - not exclusively data models used by dashboards
Tables and view (Hive data object)Hive data objectsGrant, Table migration stats
GrantData object privilegesLegacy grant, Interactive cluster grant
Legacy grantLegacy Hive grant privileges managed through GRANT, REVOKE and DENY SQL statements
Interactive cluster grantData object privileges inferred through interactive cluster data access
User defined function (UDF)Hive user defined functionsUDF code definition
ClusterThe job cluster, either job or interactive cluster
Cluster policiesThe cluster policies
Table migration statusStatus of a table or view being migrated to Unity Catalog, or not
CodeCode definitions either Python or SQL