Linter message codes

code compatibility problems

Here's the detailed explanation of the linter message codes:

`cannot-autofix-table-reference`

This indicates that the linter has found a table reference that cannot be automatically fixed. The user must manually update the table reference to point to the correct table in Unity Catalog. This mostly occurs, when table name is computed dynamically, and it's too complex for our static code analysis to detect it. We detect this problem anywhere where table name could be used: spark.sql, spark.catalog.*, spark.table, df.write.* and many more. Code examples that trigger this problem:

spark.table(f"foo_{some_table_name}")
# ..
df = spark.range(10)
df.write.saveAsTable(f"foo_{some_table_name}")
# .. or even
df.write.insertInto(f"foo_{some_table_name}")

Here the some_table_name variable is not defined anywhere in the visible scope. Though, the analyser would successfully detect table name if it is defined:

some_table_name = 'bar'
spark.table(f"foo_{some_table_name}")

We even detect string constants when coming either from dbutils.widgets.get (via job named parameters) or through loop variables. If old.things table is migrated to brand.new.stuff in Unity Catalog, the following code will trigger two messages: table-migrated-to-uc-{sql,python,python-sql} for the first query, as the contents are clearly analysable, and cannot-autofix-table-reference for the second query.

# ucx[table-migrated-to-uc-python-sql:+4:4:+4:20] Table old.things is migrated to brand.new.stuff in Unity Catalog
# ucx[cannot-autofix-table-reference:+3:4:+3:20] Can't migrate table_name argument in 'spark.sql(query)' because its value cannot be computed
table_name = f"table_{index}"
for query in ["SELECT * FROM old.things", f"SELECT * FROM {table_name}"]:
    spark.sql(query).collect()

`catalog-api-in-shared-clusters`

spark.catalog.* functions require Databricks Runtime 14.3 LTS or above on Unity Catalog clusters in Shared access mode, so of your code has spark.catalog.tableExists("table") or spark.catalog.listDatabases(), you need to ensure that your cluster is running the correct runtime version and data security mode.

`changed-result-format-in-uc`

Calls to these functions would return a list of <catalog>.<database>.<table> instead of <database>.<table>. So if you have code like this:

for table in spark.catalog.listTables():
    do_stuff_with_table(table)

you need to make sure that do_stuff_with_table can handle the new format.

`direct-filesystem-access-in-sql-query`

Direct filesystem access is deprecated in Unity Catalog. DBFS is no longer supported, so if you have code like this:

df = spark.sql("SELECT * FROM parquet.`/mnt/foo/path/to/parquet.file`")

you need to change it to use UC tables.

`direct-filesystem-access`

Direct filesystem access is deprecated in Unity Catalog. DBFS is no longer supported, so if you have code like this:

display(spark.read.csv('/mnt/things/data.csv'))

or this:

display(spark.read.csv('s3://bucket/folder/data.csv'))

You need to change it to use UC tables or UC volumes.

`dependency-not-found`

This message indicates that the linter has found a dependency, like Python source file or a notebook, that is not available in the workspace. The user must ensure that the dependency is available in the workspace. This usually means an error in the user code.

`jvm-access-in-shared-clusters`

You cannot access Spark Driver JVM on Unity Catalog clusters in Shared Access mode. If you have code like this:

spark._jspark._jvm.com.my.custom.Name()

or like this:

log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)

you need to change it to use Python equivalents.

`legacy-context-in-shared-clusters`

SparkContext (sc) is not supported on Unity Catalog clusters in Shared access mode. Rewrite it using SparkSession (spark). Example code that triggers this message:

df = spark.createDataFrame(sc.emptyRDD(), schema)

or this:

sc.parallelize([1, 2, 3])

`not-supported`

Installing eggs is no longer supported on Databricks 14.0 or higher.

`notebook-run-cannot-compute-value`

Path for dbutils.notebook.run cannot be computed and requires adjusting the notebook path. It is not clear for automated code analysis where the notebook is located, so you need to simplify the code like:

b = some_function()
dbutils.notebook.run(b)

to something like this:

a = "./leaf1.py"
dbutils.notebook.run(a)

`python-parse-error`

This is a generic message indicating that the Python code could not be parsed. The user must manually check the Python code.

`python-udf-in-shared-clusters`

applyInPandas requires DBR 14.3 LTS or above on Unity Catalog clusters in Shared access mode. Example:

df.groupby("id").applyInPandas(subtract_mean, schema="id long, v double").show()

Arrow UDFs require DBR 14.3 LTS or above on Unity Catalog clusters in Shared access mode.

@udf(returnType='int', useArrow=True)
def arrow_slen(s):
    return len(s)

It is not possible to register Java UDF from Python code on Unity Catalog clusters in Shared access mode. Use a %scala cell to register the Scala UDF using spark.udf.register. Example code that triggers this message:

spark.udf.registerJavaFunction("func", "org.example.func", IntegerType())

`rdd-in-shared-clusters`

RDD APIs are not supported on Unity Catalog clusters in Shared access mode. Use mapInArrow() or Pandas UDFs instead.

df.rdd.mapPartitions(myUdf)

`spark-logging-in-shared-clusters`

Cannot set Spark log level directly from code on Unity Catalog clusters in Shared access mode. Remove the call and set the cluster spark conf spark.log.level instead:

sc.setLogLevel("INFO")
setLogLevel("WARN")

Another example could be:

log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)

sc._jvm.org.apache.log4j.LogManager.getLogger(__name__).info("test")

`sql-parse-error`

This is a generic message indicating that the SQL query could not be parsed. The user must manually check the SQL query.

`sys-path-cannot-compute-value`

Path for sys.path.append cannot be computed and requires adjusting the path. It is not clear for automated code analysis where the path is located.

`table-migrated-to-uc-{sql,python,python-sql}`

This message indicates that the linter has found a table that has been migrated to Unity Catalog. The user must ensure that the table is available in Unity Catalog.

Postfix	Explanation
sql	Table reference in SparkSQL
python	Table reference in PySpark
python-sql	Table reference in SparkSQL called from PySpark

`to-json-in-shared-clusters`

toJson() is not available on Unity Catalog clusters in Shared access mode. Use toSafeJson() on DBR 13.3 LTS or above to get a subset of command context information. Example code that triggers this message:

dbutils.notebook.entry_point.getDbutils().notebook().getContext().toSafeJson()

`unsupported-magic-line`

This message indicates the code that could not be analysed by UCX. User must check the code manually.

cannot-autofix-table-reference​

catalog-api-in-shared-clusters​

changed-result-format-in-uc​

direct-filesystem-access-in-sql-query​

direct-filesystem-access​

dependency-not-found​

jvm-access-in-shared-clusters​

legacy-context-in-shared-clusters​

not-supported​

notebook-run-cannot-compute-value​

python-parse-error​

python-udf-in-shared-clusters​

rdd-in-shared-clusters​

spark-logging-in-shared-clusters​

sql-parse-error​

sys-path-cannot-compute-value​

table-migrated-to-uc-{sql,python,python-sql}​

to-json-in-shared-clusters​

unsupported-magic-line​