databricks.labs.dqx.llm.llm_engine
DQLLMEngine Objects
class DQLLMEngine()
High-level interface for LLM-based data quality rule generation.
This class serves as a Facade pattern, providing a simple interface to the underlying complex LLM system.
__init__
def __init__(model_config: LLMModelConfig,
spark: SparkSession | None = None,
custom_check_functions: dict[str, Callable] | None = None)
Initialize the LLM engine.
This class configures the DSPy model once and then creates components that rely on this global configuration.
Arguments:
model_config- Configuration for the LLM model.spark- Optional Spark session. If None, a new session is created.custom_check_functions- Optional custom check functions to include.
detect_business_rules_with_llm
def detect_business_rules_with_llm(
user_input: str,
schema_info: str = "") -> dspy.primitives.prediction.Prediction
Detect DQX rules based on natural language request with optional schema.
If schema_info is empty (default), it will automatically infer the schema from the user_input before generating rules.
Arguments:
user_input- Natural language description of data quality requirements.schema_info- Optional JSON string containing table schema. If empty (default), triggers schema inference.
Returns:
A Prediction object containing:
- quality_rules: The generated DQ rules
- reasoning: Explanation of the rules
- guessed_schema_json: The inferred schema (if schema was inferred)
- assumptions_bullets: Assumptions made (if schema was inferred)
- schema_info: The final schema used (if schema was inferred)
detect_primary_keys_with_llm
def detect_primary_keys_with_llm(table: str) -> dict[str, Any]
Detects primary keys using LLM-based analysis.
This method analyzes table schema and metadata to identify primary key columns.
Arguments:
table- The table name to analyze.
Returns:
A dictionary containing the primary key detection result with the following keys:
- table: The table name
- success: Whether detection was successful
- primary_key_columns: List of detected primary key columns (if successful)
- confidence: Confidence level (high/medium/low)
- reasoning: LLM reasoning for the selection
- has_duplicates: Whether duplicates were found (if validation performed)
- duplicate_count: Number of duplicate combinations (if validation performed)
- error: Error message (if failed)