dbldatagen.datasets.basic_telematics module

class BasicTelematicsProvider[source]

Bases: NoAssociatedDatasetsMixin, DatasetProvider

Basic Telematics Dataset

This is a basic telematics dataset with time-series lat, lon, and heading values.

It takes the following options when retrieving the table:
  • random: if True, generates random data

  • rows : number of rows to generate

  • partitions: number of partitions to use

  • numDevices: number of unique device IDs

  • startTimestamp: earliest timestamp for IOT time series data

  • endTimestamp: latest timestamp for IOT time series data

  • minLat: minimum latitude

  • maxLat: maximum latitude

  • minLon: minimum longitude

  • maxLon: maximum longitude

  • generateWKT: if True, generates the well-known text representation of the location

As the data specification is a DataGenerator object, you can add further columns to the data set and add constraints (when the feature is available)

Note that this datset does not use any features that would prevent it from being used as a source for a streaming dataframe, and so the flag supportsStreaming is set to True.

ALLOWED_OPTIONS = ['numDevices', 'startTimestamp', 'endTimestamp', 'minLat', 'maxLat', 'minLon', 'maxLon', 'generateWkt', 'random']
COLUMN_COUNT = 6
DEFAULT_END_TIMESTAMP = '2024-02-01 00:00:00'
DEFAULT_MAX_LAT = 90.0
DEFAULT_MAX_LON = 180.0
DEFAULT_MIN_LAT = -90.0
DEFAULT_MIN_LON = -180.0
DEFAULT_NUM_DEVICES = 1000
DEFAULT_START_TIMESTAMP = '2024-01-01 00:00:00'
MAX_DEVICE_ID = 9223372036854775807
MIN_DEVICE_ID = 1000000
getTableGenerator(sparkSession, *, tableName=None, rows=-1, partitions=-1, **options)[source]

Gets data generation instance that will produce table for named table

Parameters:
  • sparkSession – Spark session to use

  • tableName – Name of table to provide

  • rows – Number of rows requested

  • partitions – Number of partitions requested

  • autoSizePartitions – Whether to automatically size the partitions from the number of rows

  • options – Options passed to generate the table

Returns:

DataGenerator instance to generate table if successful, throws error otherwise

Implementors of the individual data providers are responsible for sizing partitions for the datasets based on the number of rows and columns. The number of partitions can be computed based on the number of rows and columns using the autoComputePartitions method.