dbldatagen.utils module
This file defines the DataGenError classes and utility functions
These are meant for internal use only
- exception DataGenError(msg, baseException=None)[source]
Bases:
Exception
Exception class to represent data generation errors
- Parameters:
msg – message related to error
baseException – underlying exception, if any that caused the issue
- coalesce_values(*args)[source]
For a supplied list of arguments, returns the first argument that does not have the value None
- Parameters:
args – variable list of arguments which are evaluated
- Returns:
First argument in list that evaluates to a non-None value
- deprecated(message='')[source]
Define a deprecated decorator without dependencies on 3rd party libraries
Note there is a 3rd party library called deprecated that provides this feature but goal is to only have dependencies on packages already used in the Databricks runtime
- ensure(cond, msg='condition does not hold true')[source]
ensure(cond, s) => throws Exception(s) if c is not true
- Parameters:
cond – condition to test
msg – Message to add to exception if exception is raised
- Raises:
DataGenError exception if condition does not hold true
- Returns:
Does not return anything but raises exception if condition does not hold
- json_value_from_path(searchPath, jsonData, defaultValue)[source]
Get JSON value from JSON data referenced by searchPath
searchPath should be a JSON path as supported by the jmespath package (see https://jmespath.org/)
- Parameters:
searchPath – A jmespath compatible JSON search path
jsonData – The json data to search (string representation of the JSON data)
defaultValue – The default value to be returned if the value was not found
- Returns:
Returns the json value if present, otherwise returns the default value
- mkBoundsList(x, default)[source]
make a bounds list from supplied parameter - otherwise use default
- Parameters:
x – integer or list of 2 values that define bounds list
default – default value if X is None
- Returns:
list of form [x,y]
- split_list_matching_condition(lst, cond)[source]
Split a list on elements that match a condition
This will find all matches of a specific condition in the list and split the list into sub lists around the element that matches this condition.
It will handle multiple matches performing splits on each match.
For example, the following code will produce the results below:
x = [‘id’, ‘city_name’, ‘id’, ‘city_id’, ‘city_pop’, ‘id’, ‘city_id’, ‘city_pop’,’city_id’, ‘city_pop’,’id’] splitListOnCondition(x, lambda el: el == ‘id’)
Result: `[[‘id’], [‘city_name’], [‘id’], [‘city_id’, ‘city_pop’],
[‘id’], [‘city_id’, ‘city_pop’, ‘city_id’, ‘city_pop’], [‘id’]]`
- Parameters:
lst – list of items to perform condition matches against
cond – lambda function or function taking single argument and returning True or False
- Returns:
list of sublists
- strip_margins(s, marginChar)[source]
Python equivalent of Scala stripMargins method
Takes a string (potentially multiline) and strips all chars up and including the first occurrence of marginChar. Used to control the formatting of generated text
strip_margins(“one |two |three”, ‘|’)
will produce
`` one two three ``
- Parameters:
s – string to strip margins from
marginChar – character to strip
- Returns:
modified string
- system_time_millis()[source]
return system time as milliseconds since start of epoch
- Returns:
system time millis as long
- topologicalSort(sources, initial_columns=None, flatten=True)[source]
Perform a topological sort over sources
Used to compute the column test data generation order of the column generation dependencies.
The column generation dependencies are based on the value of the baseColumn attribute for withColumn or withColumnSpec statements in the data generator specification.
- Parameters:
sources – list of
(name, set(names of dependencies))
pairsinitial_columns – force
initial_columns
to be computed firstflatten – if true, flatten output list
- Returns:
list of names in dependency order separated into build phases
Note
The algorith will give preference to retaining order of inbound sequence over modifying order to produce a lower number of build phases.
Overall the effect is that the input build order should be retained unless there are forward references