databricks.labs.dqx.llm.llm_utils
get_check_function_definitions
def get_check_function_definitions(
custom_check_functions: dict[str, Callable] | None = None
) -> list[dict[str, str]]
A utility function to get the definition of all check functions. This function is primarily used to generate a prompt for the LLM to generate check functions.
If provided, the function will use the custom check functions to resolve the check function. If not provided, the function will use only the built-in check functions.
Arguments:
custom_check_functions- A dictionary of custom check functions.
Returns:
list[dict]- A list of dictionaries, each containing the definition of a check function.
get_required_check_functions_definitions
def get_required_check_functions_definitions(
custom_check_functions: dict[str, Callable] | None = None
) -> list[dict[str, str]]
Extract only required function information (name and doc).
Returns:
list[dict[str, str]]: A list of dictionaries containing the required fields for each check function.
get_required_summary_stats
def get_required_summary_stats(
summary_stats: dict[str, Any]) -> dict[str, Any]
Filters the summary statistics to include only mean, min, and max values, which provide sufficient information for LLM-based rule generation while reducing token usage. Converts all values to JSON-serializable format.
Arguments:
summary_stats- Dictionary containing summary statistics for each column.
Returns:
dict- A dictionary containing the required fields for each summary stats with JSON-serializable values.
create_optimizer_training_set
def create_optimizer_training_set(
custom_check_functions: dict[str, Callable] | None = None
) -> list[dspy.Example]
Get quality check training examples for the dspy optimizer.
Arguments:
custom_check_functions- A dictionary of custom check functions.
Returns:
list[dspy.Example]- A list of dspy.Example objects created from training examples.
create_optimizer_training_set_with_stats
def create_optimizer_training_set_with_stats(
custom_check_functions: dict[str, Callable] | None = None
) -> list[dspy.Example]
Get quality check training examples using data summary statistics for the dspy optimizer.
Arguments:
custom_check_functions- A dictionary of custom check functions.
Returns:
list[dspy.Example]- A list of dspy.Example objects created from training examples with stats.
get_column_metadata
def get_column_metadata(spark: SparkSession, input_config: InputConfig) -> str
Get the column metadata for a given table.
Arguments:
input_configInputConfig - Input configuration for the table.sparkSparkSession - The Spark session used to access the table.
Returns:
str- A JSON string containing the column metadata with columns wrapped in a "columns" key.