databricks.labs.dqx.profiler.profiler_runner
ProfilerRunner Objects
class ProfilerRunner()
Runs the DQX profiler on the input data and saves the generated checks and profile summary stats.
run
def run(run_config_name: str, input_config: InputConfig,
profiler_config: ProfilerConfig, product: str,
install_folder: str) -> None
Run the DQX profiler for the given run configuration and save the generated checks and profile summary stats.
Arguments:
run_config_name
- Name of the run configuration (used in storage config).input_config
- Input data configuration.profiler_config
- Profiler configuration.product
- Product name for the installation (used in storage config).install_folder
- Installation folder path (used in storage config).
Returns:
A tuple containing the generated checks and profile summary statistics.
run_for_patterns
def run_for_patterns(patterns: list[str], exclude_patterns: list[str],
profiler_config: ProfilerConfig, checks_location: str,
install_folder: str, product: str,
max_parallelism: int) -> None
Run the DQX profiler for the given table patterns and save the generated checks and profile summary stats.
Arguments:
patterns
- List of table patterns to profile (e.g. ["catalog.schema.table*"]).exclude_patterns
- List of table patterns to exclude from profiling (e.g. ["*output", "*quarantine"]).profiler_config
- Profiler configuration.checks_location
- Delta table to save the generated checks, otherwise an absolute directory.install_folder
- Installation folder path.product
- Product name for the installation.max_parallelism
- Maximum number of parallel threads to use for profiling.
save
def save(checks: list[dict], summary_stats: dict[str, Any],
storage_config: BaseChecksStorageConfig,
profile_summary_stats_file: str) -> None
Save the generated checks and profile summary statistics to the specified files.
Arguments:
checks
- The generated checks.summary_stats
- The profile summary statistics.storage_config
- Configuration for where to save the checks.profile_summary_stats_file
- The file to save the profile summary statistics to.