databricks.labs.dqx.anomaly.profiler
Auto-discovery logic for row anomaly detection.
Analyzes DataFrames to recommend columns and segments suitable for anomaly detection using on-the-fly heuristics.
AnomalyProfile Objects
@dataclass
class AnomalyProfile()
Auto-discovery results for row anomaly detection.
column_types
NEW: maps column -> type category
unsupported_columns
NEW: columns that cannot be used
auto_discover_columns
def auto_discover_columns(df: DataFrame) -> AnomalyProfile
Auto-discover columns and segments for row anomaly detection.
Analyzes the DataFrame using on-the-fly heuristics to recommend suitable columns and segmentation strategy.
Column selection criteria:
- Numeric types (int, long, float, double, decimal)
- stddev > 0 (has variance)
- null_rate < 50%
- Exclude: timestamps, IDs (detected by name patterns)
Segment selection criteria:
- Categorical types (string, int with low cardinality)
- Distinct values: 2-50 (inclusive)
- null_rate < 10%
- At least 1000 rows per segment (warn if violated)
Arguments:
df- DataFrame to analyze.
Returns:
AnomalyProfile with recommendations and warnings.