dbldatagen.constraints.constraint module

This module defines the Constraint class

class Constraint(supportsStreaming=False)[source]

Bases: ABC

Constraint object - base class for predefined and custom constraints

This class is meant for internal use only.

SUPPORTED_OPERATORS = ['<', '>', '>=', '!=', '==', '=', '<=', '<>']
property filterExpression

Return the filter expression (as instance of type Column that evaluates to True or non-True)

static mkCombinedConstraintExpression(constraintExpressions)[source]

Generate a SQL expression that combines multiple constraints using AND

Parameters:

constraintExpressions – list of Pyspark SQL Column constraint expression objects

Returns:

combined constraint expression as Pyspark SQL Column object (or None if no valid expressions)

abstract prepareDataGenerator(dataGenerator)[source]

Prepare the data generator to generate data that matches the constraint

This method may modify the data generation rules to meet the constraint

Parameters:

dataGenerator – Data generation object that will generate the dataframe

Returns:

modified or unmodified data generator

property supportsStreaming

Return True if the constraint supports streaming dataframes

abstract transformDataframe(dataGenerator, dataFrame)[source]

Transform the dataframe to make data conform to constraint if possible

This method should not modify the dataGenerator - but may modify the dataframe

Parameters:
  • dataGenerator – Data generation object that generated the dataframe

  • dataFrame – generated dataframe

Returns:

modified or unmodified Spark dataframe

The default transformation returns the dataframe unmodified

class NoFilterMixin[source]

Bases: object

Mixin class to indicate that constraint has no filter expression

Intended to be used in implementation of the concrete constraint classes.

Use of the mixin class is optional but when used with the Constraint class and multiple inheritance, it will provide a default implementation of the _generateFilterExpression method that satisfies the abstract method requirement of the Constraint class.

When using mixins, place the mixin class first in the list of base classes.

class NoPrepareTransformMixin[source]

Bases: object

Mixin class to indicate that constraint has no filter expression

Intended to be used in implementation of the concrete constraint classes.

Use of the mixin class is optional but when used with the Constraint class and multiple inheritance, it will provide a default implementation of the prepareDataGenerator and transformeDataFrame methods that satisfies the abstract method requirements of the Constraint class.

When using mixins, place the mixin class first in the list of base classes.

prepareDataGenerator(dataGenerator)[source]

Prepare the data generator to generate data that matches the constraint

This method may modify the data generation rules to meet the constraint

Parameters:

dataGenerator – Data generation object that will generate the dataframe

Returns:

modified or unmodified data generator

transformDataframe(dataGenerator, dataFrame)[source]

Transform the dataframe to make data conform to constraint if possible

This method should not modify the dataGenerator - but may modify the dataframe

Parameters:
  • dataGenerator – Data generation object that generated the dataframe

  • dataFrame – generated dataframe

Returns:

modified or unmodified Spark dataframe

The default transformation returns the dataframe unmodified