databricks.labs.dqx.anomaly.explainability
SHAP-based explainability for row anomaly detection.
Provides contribution formatting and computation for scoring pipelines, plus TreeSHAP-based feature contribution analysis for reporting and messages. Requires the 'anomaly' extras: pip install databricks-labs-dqx[anomaly]
format_shap_contributions
def format_shap_contributions(
shap_values: np.ndarray, valid_indices: np.ndarray, num_rows: int,
engineered_feature_cols: list[str]) -> list[dict[str, float | None]]
Format SHAP values into contribution dictionaries.
compute_shap_values
def compute_shap_values(
model_local: Any, feature_matrix: pd.DataFrame,
engineered_feature_cols: list[str]) -> tuple[np.ndarray, np.ndarray]
Compute SHAP values for a model and feature matrix.
severity_from_scores
def severity_from_scores(
scores: np.ndarray, quantile_points: list[tuple[float,
float]]) -> np.ndarray
Map raw anomaly scores to severity percentiles via piecewise linear interpolation.
Numpy counterpart of add_severity_percentile_column (same quantile points, same clamping at both ends) for use inside scoring UDFs.
compute_gated_shap_contributions
def compute_gated_shap_contributions(
model_local: Any, feature_matrix: pd.DataFrame,
engineered_feature_cols: list[str], scores: np.ndarray,
quantile_points: list[tuple[float, float]] | None,
threshold: float | None) -> list[dict[str, float | None] | None]
Compute SHAP contributions only for rows whose severity reaches the anomaly threshold.
TreeSHAP costs an order of magnitude more than scoring itself, and contributions are only
surfaced for anomalous rows, so computing SHAP for the typically tiny anomalous subset
instead of every row removes most of the contributions cost. Rows below the threshold get
None (a null map). When quantile_points or threshold is unavailable, SHAP is
computed for all rows (previous behaviour).
format_contributions_map
def format_contributions_map(contributions_map: dict[str, float | None] | None,
top_n: int) -> str
Format contributions map as string for top N contributors.
Arguments:
contributions_map- Dictionary mapping feature names to contribution values (0-100 range)top_n- Number of top contributors to include
Returns:
Formatted string like "amount (85%), quantity (10%), discount (5%)" Empty string if contributions_map is None or empty
Example:
>>> format_contributions_map(dict(amount=85.0, quantity=10.0), 2) 'amount (85%), quantity (10%)'
create_optimal_tree_explainer
def create_optimal_tree_explainer(tree_model: Any) -> Any
Create TreeSHAP explainer for the given tree model.
Uses SHAP's TreeExplainer, which provides efficient SHAP value computation for tree-based models via optimized C++ implementations.
Arguments:
tree_model- Trained tree-based model (e.g., IsolationForest)
Returns:
Configured SHAP TreeExplainer
compute_contributions_for_matrix
def compute_contributions_for_matrix(
model_local: Any, feature_matrix: np.ndarray,
columns: list[str]) -> list[dict[str, float | None]]
Compute normalized SHAP contributions for a feature matrix.
compute_feature_contributions
def compute_feature_contributions(model_uri: str, df: DataFrame,
columns: list[str]) -> DataFrame
Compute per-row feature contributions using TreeSHAP.
TreeSHAP provides exact feature attributions from the IsolationForest model, showing which features contributed most to each anomaly score.
Arguments:
model_uri- MLflow model URI to load sklearn IsolationForest.df- DataFrame with data to explain.columns- Feature columns used for training.
Returns:
DataFrame with additional 'anomaly_contributions' map column containing normalized SHAP values (absolute contributions summing to 1.0 per row).
add_top_contributors_to_message
def add_top_contributors_to_message(df: DataFrame,
threshold: float,
top_n: int = 3) -> DataFrame
Enhance error messages with top feature contributors from SHAP values.
Arguments:
df- DataFrame with anomaly_score and anomaly_contributions.threshold- Score threshold for anomalies.top_n- Number of top contributors to include in message.
Returns:
DataFrame with enhanced messages including top contributing features.