Libraries & UDFs
Partners can distribute their integrations as Python, Scala, or R libraries, which can be installed on Databricks clusters using %pip install, library attachments, or cluster policies.
For setup and packaging details, see:
Telemetry based on library names
Databricks attributes library usage based on the top-level package or module/class name. Sub-class names aren't needed.
To ensure clear and consistent attribution, define the root package using the format:
<isv-name>.<product-name>
Databricks telemetry attributes all submodules imported from this package under the same identifier.
Example
If your library's top-level package is named isvname.datatool, all imports such as:
from isvname.datatool import client
Will be attributed to isvname.datatool in Databricks telemetry. This ensures unambiguous tracking of ISV library usage without requiring additional configuration.
Package naming best practices
Follow these guidelines when naming your library package:
| Guideline | Recommendation |
|---|---|
| Use your company name | The top-level namespace should be your ISV name (e.g., acme) |
| Include product name | Add the product as a sub-package (e.g., acme.dataloader) |
| Avoid generic names | Don't use common terms like utils, tools, or helpers as top-level packages |
| Be consistent | Use the same naming convention across all your Databricks integrations |
| Lowercase only | Python package names should be lowercase with underscores if needed |
The package name you choose is permanent for attribution purposes. Coordinate with the Databricks Partner Engineering team before publishing to ensure your namespace is properly registered.
Python package configuration
Configure your package name in pyproject.toml or setup.py to ensure proper attribution.
pyproject.toml (recommended)
[project]
name = "isvname-datatool"
version = "1.0.0"
description = "ISV Data Tool for Databricks"
[tool.setuptools.packages.find]
where = ["src"]
# Package structure: src/isvname/datatool/
setup.py (legacy)
from setuptools import setup, find_packages
setup(
name="isvname-datatool",
version="1.0.0",
packages=find_packages(where="src"),
package_dir={"": "src"},
)
Recommended directory structure
my-library/
├── pyproject.toml
├── src/
│ └── isvname/
│ └── datatool/
│ ├── __init__.py
│ ├── client.py
│ └── transforms.py
└── tests/
Scala library configuration
For Scala libraries, use the standard Maven/SBT group ID and artifact ID structure:
// build.sbt
organization := "com.isvname"
name := "datatool"
version := "1.0.0"
The fully qualified class name (e.g., com.isvname.datatool.Client) is used for attribution.
How attribution works
When your library runs on a Databricks cluster, the platform captures:
- Import statements - The top-level package name from Python imports
- Class instantiation - The fully qualified class name for Scala/Java
- UDF registration - The module path used when registering user-defined functions
This data flows to Databricks system tables, enabling usage tracking without any runtime configuration by customers.
Attribution for libraries differs from other integration types. There's no User-Agent string to configure—the package/class name itself serves as the identifier.
User-defined functions (UDFs)
When registering UDFs, the function's module path is captured for attribution:
from pyspark.sql.functions import udf
from isvname.datatool import transforms
# This UDF will be attributed to isvname.datatool
my_udf = udf(transforms.custom_transform)
spark.udf.register("isv_transform", my_udf)
What's next
- Configure Lakebase: Set up telemetry for PostgreSQL-compatible clients. See Lakebase Integrations.
- Review all integration types: See the complete list of supported integrations.
- Review User-Agent format: Understand the required format and guidelines.