databricks.labs.dqx.anomaly.explainability
SHAP-based explainability for row anomaly detection.
Provides contribution formatting and computation for scoring pipelines, plus TreeSHAP-based feature contribution analysis for reporting and messages. Requires the 'anomaly' extras: pip install databricks-labs-dqx[anomaly]
format_shap_contributions
def format_shap_contributions(
shap_values: np.ndarray, valid_indices: np.ndarray, num_rows: int,
engineered_feature_cols: list[str]) -> list[dict[str, float | None]]
Format SHAP values into contribution dictionaries.
compute_shap_values
def compute_shap_values(
model_local: Any, feature_matrix: pd.DataFrame,
engineered_feature_cols: list[str]) -> tuple[np.ndarray, np.ndarray]
Compute SHAP values for a model and feature matrix.
format_contributions_map
def format_contributions_map(contributions_map: dict[str, float | None] | None,
top_n: int) -> str
Format contributions map as string for top N contributors.
Arguments:
contributions_map- Dictionary mapping feature names to contribution values (0-100 range)top_n- Number of top contributors to include
Returns:
Formatted string like "amount (85%), quantity (10%), discount (5%)" Empty string if contributions_map is None or empty
Example:
>>> format_contributions_map(dict(amount=85.0, quantity=10.0), 2) 'amount (85%), quantity (10%)'
create_optimal_tree_explainer
def create_optimal_tree_explainer(tree_model: Any) -> Any
Create TreeSHAP explainer for the given tree model.
Uses SHAP's TreeExplainer, which provides efficient SHAP value computation for tree-based models via optimized C++ implementations.
Arguments:
tree_model- Trained tree-based model (e.g., IsolationForest)
Returns:
Configured SHAP TreeExplainer
compute_contributions_for_matrix
def compute_contributions_for_matrix(
model_local: Any, feature_matrix: np.ndarray,
columns: list[str]) -> list[dict[str, float | None]]
Compute normalized SHAP contributions for a feature matrix.
compute_feature_contributions
def compute_feature_contributions(model_uri: str, df: DataFrame,
columns: list[str]) -> DataFrame
Compute per-row feature contributions using TreeSHAP.
TreeSHAP provides exact feature attributions from the IsolationForest model, showing which features contributed most to each anomaly score.
Arguments:
model_uri- MLflow model URI to load sklearn IsolationForest.df- DataFrame with data to explain.columns- Feature columns used for training.
Returns:
DataFrame with additional 'anomaly_contributions' map column containing normalized SHAP values (absolute contributions summing to 1.0 per row).
add_top_contributors_to_message
def add_top_contributors_to_message(df: DataFrame,
threshold: float,
top_n: int = 3) -> DataFrame
Enhance error messages with top feature contributors from SHAP values.
Arguments:
df- DataFrame with anomaly_score and anomaly_contributions.threshold- Score threshold for anomalies.top_n- Number of top contributors to include in message.
Returns:
DataFrame with enhanced messages including top contributing features.