dbldatagen.datasets.basic_telematics module
- class BasicTelematicsProvider[source]
Bases:
NoAssociatedDatasetsMixin
,DatasetProvider
Basic Telematics Dataset
This is a basic telematics dataset with time-series lat, lon, and heading values.
- It takes the following options when retrieving the table:
random: if True, generates random data
rows : number of rows to generate
partitions: number of partitions to use
numDevices: number of unique device IDs
startTimestamp: earliest timestamp for IOT time series data
endTimestamp: latest timestamp for IOT time series data
minLat: minimum latitude
maxLat: maximum latitude
minLon: minimum longitude
maxLon: maximum longitude
generateWKT: if True, generates the well-known text representation of the location
As the data specification is a DataGenerator object, you can add further columns to the data set and add constraints (when the feature is available)
Note that this datset does not use any features that would prevent it from being used as a source for a streaming dataframe, and so the flag supportsStreaming is set to True.
- ALLOWED_OPTIONS = ['numDevices', 'startTimestamp', 'endTimestamp', 'minLat', 'maxLat', 'minLon', 'maxLon', 'generateWkt', 'random']
- COLUMN_COUNT = 6
- DEFAULT_END_TIMESTAMP = '2024-02-01 00:00:00'
- DEFAULT_MAX_LAT = 90.0
- DEFAULT_MAX_LON = 180.0
- DEFAULT_MIN_LAT = -90.0
- DEFAULT_MIN_LON = -180.0
- DEFAULT_NUM_DEVICES = 1000
- DEFAULT_START_TIMESTAMP = '2024-01-01 00:00:00'
- MAX_DEVICE_ID = 9223372036854775807
- MIN_DEVICE_ID = 1000000
- getTableGenerator(sparkSession, *, tableName=None, rows=-1, partitions=-1, **options)[source]
Gets data generation instance that will produce table for named table
- Parameters:
sparkSession – Spark session to use
tableName – Name of table to provide
rows – Number of rows requested
partitions – Number of partitions requested
autoSizePartitions – Whether to automatically size the partitions from the number of rows
options – Options passed to generate the table
- Returns:
DataGenerator instance to generate table if successful, throws error otherwise
Implementors of the individual data providers are responsible for sizing partitions for the datasets based on the number of rows and columns. The number of partitions can be computed based on the number of rows and columns using the autoComputePartitions method.