Installation Notes

The data generator framework can be installed into your Databricks environment by one of the following methods:

  • Installation from the PyPi package

  • Installing and building directly from the Databricks Labs Github repository

  • Installing the Python wheel file into your environment

Installing from PyPi

To install the dbldatagen package from PyPi, add a cell to your notebook with the following code:

%pip install dbldatagen

This will install the PyPi package and works in regular notebooks, Delta Live Tables pipeline notebooks, and works on the community edition.

If working using the command line, you can issue the following command to install within your environment.

pip install dbldatagen

Installing from Databricks Labs repository source

When developing with the Databricks notebook environment, you can use the notebook-scoped library install

to install and build from the source in the Databricks Labs Github repository.

To do this, add and execute the following cell at the start of your notebook:

%pip install git+https://github.com/databrickslabs/dbldatagen@current

By default, this will install a fresh build from the latest release based on the master branch. You can install from a specific branch by appending the branch identifier or tag to the GitHub URL.

%pip install git+https://github.com/databrickslabs/dbldatagen@dbr_7_3_LTS_compat

The following tags will be used to pick up specific versions:

  • current - the latest build from the master branch + doc changes and critical bug fixes

  • stable - the latest release from the master branch (with changes for version marking and documentation only).

  • preview - a preview build of forthcoming features (typically from the develop branch)

Note

In rare cases, if there are critical bug fixes, these will be incorporated into stable version.

Installing older releases

Prior to the availability of PyPi releases, the release process included Python wheels in the GitHub release assets.

You can download a specific Python wheel directly from the GitHub releases for these cases.

The complete set of available releases can be accessed here.

You can install a specific wheel using either %pip install or the manual method.

To install a Python wheel from a specific release with pip use the following syntax:

%pip install https://github.com/databrickslabs/dbldatagen/releases/download/v021/dbldatagen-0.2.1-py3-none-any.whl

Replace the reference to the v021 wheel with the reference to the appropriate wheel as needed.

To install older releases from PyPi, include a release qualifier in the Pip install command.

For example:

%pip install dbldatagen==v0.3.0

All new releases will be released via PyPi only.

Manual installation

Older releases may still be installed manually, however the use of one of the Pip install based methods is recommended.

The following steps outline how to install an older binary release manually

  • Locate the wheel file in the Databricks Labs data generator releases

  • Download the wheel artifact from the releases
    • Select the desired release

    • Select the wheel artifact from the release assets

    • Download it

  • Create library entry in workspace
    • Create the workspace library

    • Upload previously downloaded wheel

  • Attach library to cluster

Additional information

See also

See the following links for more details: