dbldatagen.schema_parser module
This file defines the SchemaParser class
- class SchemaParser[source]
Bases:
object
SchemaParser class
Creates pyspark SQL datatype from string
- classmethod columnTypeFromString(type_string)[source]
Generate a Spark SQL data type from a string
- Allowable options for type_string parameter are:
string, varchar, char, nvarchar,
int, integer,
bigint, long,
bool, boolean,
smallint, short
binary
tinyint, byte
date
timestamp, datetime,
double, float, date, short, byte,
decimal or decimal(p) or decimal(p, s) or number(p, s)
map<type1, type2> where type1 and type2 are type definitions of form accepted by parser
array<type1> where type1 is type definitions of form accepted by parser
struct<a:binary, b:int, c:float>
Type definitions may be nested recursively - for example, the following are valid type definitions: * array<array<int>> * `struct<a:array<int>, b:int, c:float> * map<string, struct<a:array<int>, b:int, c:float>>
- Parameters:
type_string – String representation of SQL type such as ‘integer’ etc.
- Returns:
Spark SQL type
- classmethod columnsReferencesFromSQLString(sql_string, filterItems=None)[source]
Generate a list of possible column references from a SQL string
This method finds all condidate references to SQL columnn ids in the string
To avoid the overhead of a full SQL parser, the implementation will simply look for possible field names
Further improvements may eliminate some common syntax but in current form, reserved words will also be returned as possible column references.
So any uses of this must not assume that all possible references are valid column references
- Parameters:
sql_string – String representation of SQL expression
filterItems – filter results to only results in items listed
- Returns:
list of possible column references
- classmethod getTypeDefinitionParser()[source]
Define a pyparsing based parser for Spark SQL type definitions
- Allowable constructs for generated type parser are:
string, varchar, char, nvarchar,
int, integer,
bigint, long,
bool, boolean,
smallint, short
binary
tinyint, byte
date
timestamp, datetime,
double, float, date, short, byte,
decimal or decimal(p) or decimal(p, s) or number(p, s)
map<type1, type2> where type1 and type2 are type definitions of form accepted by parser
array<type1> where type1 is type definitions of form accepted by parser
struct<a:binary, b:int, c:float>
Type definitions may be nested recursively - for example, the following are valid type definitions: * array<array<int>> * `struct<a:array<int>, b:int, c:float> * map<string, struct<a:array<int>, b:int, c:float>>
- Returns:
parser
See the package pyparsing for details of how the parser mechanism works https://pypi.org/project/pyparsing/