.. Databricks Labs Data Generator documentation master file, created by sphinx-quickstart on Sun Jun 21 10:54:30 2020. Databricks Labs Data Generator Documentation ============================================ The Databricks Labs Data Generator project provides a convenient way to generate large volumes of synthetic data from within a Databricks notebook (or a regular Spark application). By defining a data generation spec, either in conjunction with an existing schema or through creating a schema on the fly, you can control how synthetic data is generated. As the data generator generates a PySpark data frame, it is simple to create a view over it to expose it to Scala or R-based Spark applications also. As it is installable via `%pip install`, it can also be incorporated in environments such as `Delta Live Tables `_ also. .. toctree:: :maxdepth: 1 :caption: Getting Started Get Started Here Installation instructions Generating column data Using data ranges Generating text data Using data distributions Options for column specification Generating repeatable data Using streaming data Generating JSON and structured column data Generating synthetic data from existing data Generating Change Data Capture (CDC) data Using multiple tables Extending text generation Use with Delta Live Tables Troubleshooting data generation .. toctree:: :maxdepth: 1 :caption: API Quick API index The dbldatagen package API .. toctree:: :maxdepth: 1 :caption: Development Building and contributing Change log Build requirements .. toctree:: :maxdepth: 1 :caption: License license Indices and tables ------------------ * :ref:`genindex` * :ref:`modindex`