dbldatagen.function_builder module

class ColumnGeneratorBuilder[source]

Bases: object

Helper class to build functional column generators of specific forms

classmethod mkExprChoicesFn(values, weights, seed_column, datatype)[source]

Create SQL expression to compute the weighted values expression

build an expression of the form:

case
   when rnd_column <= weight1 then value1
   when rnd_column <= weight2 then value2
   ...
   when rnd_column <= weightN then  valueN
   else valueN
end

based on computed probability distribution for values.

In Python 3.6 onwards, we could use the choices function but this python version is not guaranteed on all Databricks distributions

Parameters:
  • values – list of values

  • weights – list of weights

  • seed_column – base column for expression

  • datatype – data type of function return value