Skip to main content

databricks.labs.dqx.anomaly.explainability

SHAP-based explainability for row anomaly detection.

Provides contribution formatting and computation for scoring pipelines, plus TreeSHAP-based feature contribution analysis for reporting and messages. Requires the 'anomaly' extras: pip install databricks-labs-dqx[anomaly]

format_shap_contributions

def format_shap_contributions(
shap_values: np.ndarray, valid_indices: np.ndarray, num_rows: int,
engineered_feature_cols: list[str]) -> list[dict[str, float | None]]

Format SHAP values into contribution dictionaries.

compute_shap_values

def compute_shap_values(
model_local: Any, feature_matrix: pd.DataFrame,
engineered_feature_cols: list[str]) -> tuple[np.ndarray, np.ndarray]

Compute SHAP values for a model and feature matrix.

format_contributions_map

def format_contributions_map(contributions_map: dict[str, float | None] | None,
top_n: int) -> str

Format contributions map as string for top N contributors.

Arguments:

  • contributions_map - Dictionary mapping feature names to contribution values (0-100 range)
  • top_n - Number of top contributors to include

Returns:

Formatted string like "amount (85%), quantity (10%), discount (5%)" Empty string if contributions_map is None or empty

Example:

>>> format_contributions_map(dict(amount=85.0, quantity=10.0), 2) 'amount (85%), quantity (10%)'

create_optimal_tree_explainer

def create_optimal_tree_explainer(tree_model: Any) -> Any

Create TreeSHAP explainer for the given tree model.

Uses SHAP's TreeExplainer, which provides efficient SHAP value computation for tree-based models via optimized C++ implementations.

Arguments:

  • tree_model - Trained tree-based model (e.g., IsolationForest)

Returns:

Configured SHAP TreeExplainer

compute_contributions_for_matrix

def compute_contributions_for_matrix(
model_local: Any, feature_matrix: np.ndarray,
columns: list[str]) -> list[dict[str, float | None]]

Compute normalized SHAP contributions for a feature matrix.

compute_feature_contributions

def compute_feature_contributions(model_uri: str, df: DataFrame,
columns: list[str]) -> DataFrame

Compute per-row feature contributions using TreeSHAP.

TreeSHAP provides exact feature attributions from the IsolationForest model, showing which features contributed most to each anomaly score.

Arguments:

  • model_uri - MLflow model URI to load sklearn IsolationForest.
  • df - DataFrame with data to explain.
  • columns - Feature columns used for training.

Returns:

DataFrame with additional 'anomaly_contributions' map column containing normalized SHAP values (absolute contributions summing to 1.0 per row).

add_top_contributors_to_message

def add_top_contributors_to_message(df: DataFrame,
threshold: float,
top_n: int = 3) -> DataFrame

Enhance error messages with top feature contributors from SHAP values.

Arguments:

  • df - DataFrame with anomaly_score and anomaly_contributions.
  • threshold - Score threshold for anomalies.
  • top_n - Number of top contributors to include in message.

Returns:

DataFrame with enhanced messages including top contributing features.