Skip to main content

databricks.labs.dqx.anomaly.drift

Drift detection and warnings for row anomaly models.

Compares current data distribution against baseline statistics to detect significant changes; provides check/warn helpers for scoring pipelines.

DriftResult Objects

@dataclass
class DriftResult()

Results from drift detection.

compute_drift_score

def compute_drift_score(df: DataFrame,
columns: list[str],
baseline_stats: dict[str, dict[str, float]],
threshold: float = 3.0) -> DriftResult

Compute drift score comparing current data to baseline statistics.

Arguments:

  • df - Current DataFrame to check for drift.
  • columns - Columns to check for drift.
  • baseline_stats - Baseline statistics from training data.
  • threshold - Drift score threshold (default 3.0 = 3 std deviations).

Returns:

DriftResult with detection status and details.

prepare_drift_df

def prepare_drift_df(
df: DataFrame, columns: list[str],
record: AnomalyModelRecord) -> tuple[DataFrame, list[str]]

Prepare drift DataFrame and columns aligned to training baseline stats.

check_segment_drift

def check_segment_drift(segment_df: DataFrame, columns: list[str],
segment_model: AnomalyModelRecord,
drift_threshold: float | None,
drift_threshold_value: float) -> None

Check and warn about data drift in a segment.

check_and_warn_drift

def check_and_warn_drift(df: DataFrame, columns: list[str],
record: AnomalyModelRecord, model_name: str,
drift_threshold: float | None,
drift_threshold_value: float) -> None

Check for data drift and issue warning if detected.