Libraries & UDFs

Partners can distribute their integrations as Python, Scala, or R libraries, which can be installed on Databricks clusters using %pip install, library attachments, or cluster policies.

For setup and packaging details, see:

Telemetry based on library names

Databricks attributes library usage based on the top-level package or module/class name. Sub-class names aren't needed.

To ensure clear and consistent attribution, define the root package using the format:

<isv-name>.<product-name>

Databricks telemetry attributes all submodules imported from this package under the same identifier.

Example

If your library's top-level package is named isvname.datatool, all imports such as:

from isvname.datatool import client

Will be attributed to isvname.datatool in Databricks telemetry. This ensures unambiguous tracking of ISV library usage without requiring additional configuration.

Package naming best practices

Follow these guidelines when naming your library package:

Guideline	Recommendation
Use your company name	The top-level namespace should be your ISV name (e.g., `acme`)
Include product name	Add the product as a sub-package (e.g., `acme.dataloader`)
Avoid generic names	Don't use common terms like `utils`, `tools`, or `helpers` as top-level packages
Be consistent	Use the same naming convention across all your Databricks integrations
Lowercase only	Python package names should be lowercase with underscores if needed

important

The package name you choose is permanent for attribution purposes. Coordinate with the Databricks Partner Engineering team before publishing to ensure your namespace is properly registered.

Python package configuration

Configure your package name in pyproject.toml or setup.py to ensure proper attribution.

pyproject.toml (recommended)

[project]
name = "isvname-datatool"
version = "1.0.0"
description = "ISV Data Tool for Databricks"

[tool.setuptools.packages.find]
where = ["src"]

# Package structure: src/isvname/datatool/

setup.py (legacy)

from setuptools import setup, find_packages

setup(
    name="isvname-datatool",
    version="1.0.0",
    packages=find_packages(where="src"),
    package_dir={"": "src"},
)

Recommended directory structure

my-library/
├── pyproject.toml
├── src/
│   └── isvname/
│       └── datatool/
│           ├── __init__.py
│           ├── client.py
│           └── transforms.py
└── tests/

Scala library configuration

For Scala libraries, use the standard Maven/SBT group ID and artifact ID structure:

// build.sbt
organization := "com.isvname"
name := "datatool"
version := "1.0.0"

The fully qualified class name (e.g., com.isvname.datatool.Client) is used for attribution.

How attribution works

When your library runs on a Databricks cluster, the platform captures:

Import statements - The top-level package name from Python imports
Class instantiation - The fully qualified class name for Scala/Java
UDF registration - The module path used when registering user-defined functions

This data flows to Databricks system tables, enabling usage tracking without any runtime configuration by customers.

note

Attribution for libraries differs from other integration types. There's no User-Agent string to configure—the package/class name itself serves as the identifier.

User-defined functions (UDFs)

When registering UDFs, the function's module path is captured for attribution:

from pyspark.sql.functions import udf
from isvname.datatool import transforms

# This UDF will be attributed to isvname.datatool
my_udf = udf(transforms.custom_transform)
spark.udf.register("isv_transform", my_udf)

What's next

Configure Lakebase: Set up telemetry for PostgreSQL-compatible clients. See Lakebase Integrations.
Review all integration types: See the complete list of supported integrations.
Review User-Agent format: Understand the required format and guidelines.

Telemetry based on library names​

Example​

Package naming best practices​

Python package configuration​

pyproject.toml (recommended)​

setup.py (legacy)​

Recommended directory structure​

Scala library configuration​

How attribution works​

User-defined functions (UDFs)​

What's next​