Skip to main content

databricks.labs.dqx.profiler.profiler_runner

ProfilerRunner Objects

class ProfilerRunner()

Runs the DQX profiler on the input data and saves the generated checks and profile summary stats.

run

def run(run_config_name: str, input_config: InputConfig,
profiler_config: ProfilerConfig, product: str,
install_folder: str) -> None

Run the DQX profiler for the given run configuration and save the generated checks and profile summary stats.

Arguments:

  • run_config_name - Name of the run configuration (used in storage config).
  • input_config - Input data configuration.
  • profiler_config - Profiler configuration.
  • product - Product name for the installation (used in storage config).
  • install_folder - Installation folder path (used in storage config).

Returns:

A tuple containing the generated checks and profile summary statistics.

run_for_patterns

def run_for_patterns(patterns: list[str], exclude_patterns: list[str],
profiler_config: ProfilerConfig, checks_location: str,
install_folder: str, product: str,
max_parallelism: int) -> None

Run the DQX profiler for the given table patterns and save the generated checks and profile summary stats.

Arguments:

  • patterns - List of table patterns to profile (e.g. ["catalog.schema.table*"]).
  • exclude_patterns - List of table patterns to exclude from profiling (e.g. ["*output", "*quarantine"]).
  • profiler_config - Profiler configuration.
  • checks_location - Delta table to save the generated checks, otherwise an absolute directory.
  • install_folder - Installation folder path.
  • product - Product name for the installation.
  • max_parallelism - Maximum number of parallel threads to use for profiling.

save

def save(checks: list[dict], summary_stats: dict[str, Any],
storage_config: BaseChecksStorageConfig,
profile_summary_stats_file: str) -> None

Save the generated checks and profile summary statistics to the specified files.

Arguments:

  • checks - The generated checks.
  • summary_stats - The profile summary statistics.
  • storage_config - Configuration for where to save the checks.
  • profile_summary_stats_file - The file to save the profile summary statistics to.