databricks.labs.dqx.anomaly.model_config
Model configuration and record structure for row anomaly detection.
Single responsibility: model record structure (dataclasses) and config identity (compute_config_hash). Persistence lives in model_registry.
ModelIdentity Objects
@dataclass
class ModelIdentity()
Core model identification (5 fields).
TrainingMetadata Objects
@dataclass
class TrainingMetadata()
Training configuration and metrics (7 fields).
FeatureEngineering Objects
@dataclass
class FeatureEngineering()
Feature engineering metadata (5 fields).
SegmentationConfig Objects
@dataclass
class SegmentationConfig()
Segmentation configuration (5 fields).
AnomalyModelRecord Objects
@dataclass
class AnomalyModelRecord()
Registry record for a trained anomaly model using composition.
Composed of 4 focused components, each under the 16-attribute limit:
- identity: Core model identification (5 fields)
- training: Training configuration and metrics (6 fields)
- features: Feature engineering metadata (5 fields)
- segmentation: Segmentation configuration (5 fields)
Stored as nested structs in Delta tables (no flattening needed).
compute_config_hash
def compute_config_hash(columns: list[str],
segment_by: list[str] | None) -> str
Generate stable hash of model configuration.
Arguments:
columns- List of column names used for trainingsegment_by- List of columns used for segmentation, or None
Returns:
16-character hex string (first 16 chars of SHA256 hash)
Notes:
This hash uniquely identifies a model configuration based on:
- Sorted list of columns (order-independent)
- Sorted list of segment_by columns (order-independent) Used for collision detection when same model_name is reused with different configs.