databricks.labs.dqx.llm.llm_engine

DQLLMEngine Objects

class DQLLMEngine()

High-level interface for LLM-based data quality rule generation.

This class serves as a Facade pattern, providing a simple interface to the underlying complex LLM system.

init

def __init__(model_config: LLMModelConfig,
             spark: SparkSession | None = None,
             custom_check_functions: dict[str, Callable] | None = None)

Initialize the LLM engine.

This class configures the DSPy model once per worker process with base settings.

Arguments:

model_config - Configuration for the LLM model.
spark - Optional Spark session. If None, a new session is created.
custom_check_functions - Optional custom check functions to include.

detect_business_rules_with_llm

def detect_business_rules_with_llm(
    user_input: str = "",
    schema_info: str = "",
    summary_stats: dict[str, Any] | None = None
) -> dspy.primitives.prediction.Prediction

Detect DQX rules based on natural language request with optional schema or summary statistics.

If schema_info is empty (default), it will automatically infer the schema from the user_input before generating rules.

Arguments:

user_input - Optional natural language description of data quality requirements.
schema_info - Optional JSON string containing table schema. If empty (default), triggers schema inference.
summary_stats - Optional dictionary containing summary statistics of the input data.

Returns:

A Prediction object containing:

quality_rules: The generated DQ rules
reasoning: Explanation of the rules
guessed_schema_json: The inferred schema (if schema was inferred)
assumptions_bullets: Assumptions made (if schema was inferred)
schema_info: The final schema used (if schema was inferred)

detect_primary_keys_with_llm

def detect_primary_keys_with_llm(table: str) -> dict[str, Any]

Detects primary keys using LLM-based analysis.

This method analyzes table schema and metadata to identify primary key columns.

Arguments:

table - The table name to analyze.

Returns:

A dictionary containing the primary key detection result with the following keys:

table: The table name
success: Whether detection was successful
primary_key_columns: List of detected primary key columns (if successful)
confidence: Confidence level (high/medium/low)
reasoning: LLM reasoning for the selection
has_duplicates: Whether duplicates were found (if validation performed)
duplicate_count: Number of duplicate combinations (if validation performed)
error: Error message (if failed)

DQLLMEngine Objects​

__init__​

detect_business_rules_with_llm​

detect_primary_keys_with_llm​

DQLLMEngine Objects

init

detect_business_rules_with_llm

detect_primary_keys_with_llm