databricks.labs.dqx.anomaly.training_strategies
Training strategy pattern for row anomaly detection.
Enables different anomaly detection algorithms through a common interface. Currently implements IsolationForest, but designed for extensibility.
Uses dependency injection for the model registry, enabling:
- Consistent registration path with EnsembleTrainer
- Easy mocking/testing
- Potential for alternative backends
AnomalyTrainingStrategy Objects
class AnomalyTrainingStrategy(ABC)
Training strategy interface for row anomaly models.
Implement this interface to add new anomaly detection algorithms. Uses dependency injection for the model registry.
__init__
def __init__(registry: ModelRegistryBase | None = None) -> None
Initialize strategy with optional registry.
Arguments:
registry- Model registry to use. Defaults to MLflow/Unity Catalog.
train
@abstractmethod
def train(train_df: DataFrame, val_df: DataFrame, columns: list[str],
params: AnomalyParams, model_name: str, *,
allow_ensemble: bool) -> TrainingResult
Train an anomaly detection model.
Arguments:
train_df- Training DataFrameval_df- Validation DataFramecolumns- Feature columns to useparams- Training parametersmodel_name- Name for registered modelallow_ensemble- Whether to allow ensemble training
Returns:
TrainingResult with model URI, metrics, and metadata
IsolationForestTrainingStrategy Objects
class IsolationForestTrainingStrategy(AnomalyTrainingStrategy)
IsolationForest training strategy (default).
Uses sklearn's IsolationForest algorithm with optional ensemble training. Both single-model and ensemble paths use the same ModelRegistryBase abstraction.
train
def train(train_df: DataFrame, val_df: DataFrame, columns: list[str],
params: AnomalyParams, model_name: str, *,
allow_ensemble: bool) -> TrainingResult
Train IsolationForest model(s).
If allow_ensemble and params.ensemble_size > 1, trains an ensemble. Otherwise trains a single model using the registry abstraction.