Skip to main content

databricks.labs.dqx.io

read_input_data

def read_input_data(spark: SparkSession,
input_config: InputConfig) -> DataFrame

Reads input data from the specified location and format.

Arguments:

  • spark - SparkSession
  • input_config - InputConfig with source location/table name, format, and options

Returns:

DataFrame with values read from the input data

save_dataframe_as_table

def save_dataframe_as_table(df: DataFrame, output_config: OutputConfig)

Helper method to save a DataFrame to a Delta table.

Arguments:

  • df - The DataFrame to save
  • output_config - Output table name, write mode, and options

get_reference_dataframes

def get_reference_dataframes(
spark: SparkSession,
reference_tables: dict[str, InputConfig] | None = None
) -> dict[str, DataFrame] | None

Get reference DataFrames from the provided reference tables configuration.

Arguments:

  • spark - SparkSession
  • reference_tables - A dictionary mapping of reference table names to their input configurations.

Examples:

reference_tables = {
"reference_table_1": InputConfig(location="db.schema.table1", format="delta"),
"reference_table_2": InputConfig(location="db.schema.table2", format="delta")
}

Returns:

A dictionary mapping reference table names to their DataFrames.