Skip to main content

databricks.labs.dqx.io

read_input_data

def read_input_data(spark: SparkSession,
input_config: InputConfig) -> DataFrame

Reads input data from the specified location and format.

Arguments:

  • spark - SparkSession
  • input_config - InputConfig with source location/table name, format, and options

Returns:

DataFrame with values read from the input data

save_dataframe_as_table

def save_dataframe_as_table(
df: DataFrame, output_config: OutputConfig) -> StreamingQuery | None

Helper method to save a DataFrame to a Delta table.

Arguments:

  • df - The DataFrame to save
  • output_config - Output table name, write mode, and options

Returns:

StreamingQuery handle if the DataFrame is streaming, None otherwise

is_one_time_trigger

def is_one_time_trigger(trigger: dict[str, Any] | None) -> bool

Checks if a trigger is a one-time trigger that should wait for completion.

Arguments:

  • trigger - Trigger configuration dict

Returns:

True if the trigger is 'once' or 'availableNow', False otherwise

get_reference_dataframes

def get_reference_dataframes(
spark: SparkSession,
reference_tables: dict[str, InputConfig] | None = None
) -> dict[str, DataFrame] | None

Get reference DataFrames from the provided reference tables configuration.

Arguments:

  • spark - SparkSession
  • reference_tables - A dictionary mapping of reference table names to their input configurations.

Examples:

reference_tables = {
"reference_table_1": InputConfig(location="db.schema.table1", format="delta"),
"reference_table_2": InputConfig(location="db.schema.table2", format="delta")
}

Returns:

A dictionary mapping reference table names to their DataFrames.