Skip to main content

Libraries & UDFs

Partners can distribute their integrations as Python, Scala, or R libraries, which can be installed on Databricks clusters using %pip install, library attachments, or cluster policies.

For setup and packaging details, see:

Telemetry based on library names

Databricks attributes library usage based on the top-level package or module/class name. Sub-class names aren't needed.

To ensure clear and consistent attribution, define the root package using the format:

<isv-name>.<product-name>

Databricks telemetry attributes all submodules imported from this package under the same identifier.

Example

If your library's top-level package is named isvname.datatool, all imports such as:

from isvname.datatool import client

Will be attributed to isvname.datatool in Databricks telemetry. This ensures unambiguous tracking of ISV library usage without requiring additional configuration.

Package naming best practices

Follow these guidelines when naming your library package:

GuidelineRecommendation
Use your company nameThe top-level namespace should be your ISV name (e.g., acme)
Include product nameAdd the product as a sub-package (e.g., acme.dataloader)
Avoid generic namesDon't use common terms like utils, tools, or helpers as top-level packages
Be consistentUse the same naming convention across all your Databricks integrations
Lowercase onlyPython package names should be lowercase with underscores if needed
important

The package name you choose is permanent for attribution purposes. Coordinate with the Databricks Partner Engineering team before publishing to ensure your namespace is properly registered.

Python package configuration

Configure your package name in pyproject.toml or setup.py to ensure proper attribution.

[project]
name = "isvname-datatool"
version = "1.0.0"
description = "ISV Data Tool for Databricks"

[tool.setuptools.packages.find]
where = ["src"]

# Package structure: src/isvname/datatool/

setup.py (legacy)

from setuptools import setup, find_packages

setup(
name="isvname-datatool",
version="1.0.0",
packages=find_packages(where="src"),
package_dir={"": "src"},
)
my-library/
├── pyproject.toml
├── src/
│ └── isvname/
│ └── datatool/
│ ├── __init__.py
│ ├── client.py
│ └── transforms.py
└── tests/

Scala library configuration

For Scala libraries, use the standard Maven/SBT group ID and artifact ID structure:

// build.sbt
organization := "com.isvname"
name := "datatool"
version := "1.0.0"

The fully qualified class name (e.g., com.isvname.datatool.Client) is used for attribution.

How attribution works

When your library runs on a Databricks cluster, the platform captures:

  1. Import statements - The top-level package name from Python imports
  2. Class instantiation - The fully qualified class name for Scala/Java
  3. UDF registration - The module path used when registering user-defined functions

This data flows to Databricks system tables, enabling usage tracking without any runtime configuration by customers.

note

Attribution for libraries differs from other integration types. There's no User-Agent string to configure—the package/class name itself serves as the identifier.

User-defined functions (UDFs)

When registering UDFs, the function's module path is captured for attribution:

from pyspark.sql.functions import udf
from isvname.datatool import transforms

# This UDF will be attributed to isvname.datatool
my_udf = udf(transforms.custom_transform)
spark.udf.register("isv_transform", my_udf)

What's next