Skip to main content

databricks.labs.dqx.telemetry

log_telemetry

def log_telemetry(ws: WorkspaceClient, key: str, value: str) -> None

Trace specific telemetry information in the Databricks workspace by setting user agent extra info.

Arguments:

  • ws - WorkspaceClient
  • key - telemetry key to log
  • value - telemetry value to log

telemetry_logger

def telemetry_logger(key: str,
value: str,
workspace_client_attr: str = "ws") -> Callable

Decorator to log telemetry for method calls. By default, it expects the decorated method to have "ws" attribute for workspace client.

Usage: @telemetry_logger("telemetry_key", "telemetry_value") # Uses "ws" attribute for workspace client by default @telemetry_logger("telemetry_key", "telemetry_value", "my_ws_client") # Custom attribute

Arguments:

  • key - Telemetry key to log
  • value - Telemetry value to log
  • workspace_client_attr - Name of the workspace client attribute on the class (defaults to "ws")

log_dataframe_telemetry

def log_dataframe_telemetry(ws: WorkspaceClient, spark: SparkSession,
df: DataFrame)

Log telemetry information about a Spark DataFrame to the Databricks workspace including:

  • Number of input tables and non-table inputs
  • Whether the DataFrame is streaming
  • Whether running in a Delta Live Tables (DLT) pipeline

Arguments:

  • ws - WorkspaceClient
  • spark - SparkSession
  • df - DataFrame to analyze

Returns:

None

count_tables_in_spark_plan

def count_tables_in_spark_plan(df: DataFrame) -> int

Count the number of tables referenced in a DataFrame's Spark execution plan.

This function analyzes the Analyzed Logical Plan section of the Spark execution plan to identify table references (via SubqueryAlias nodes). File-based DataFrames and in-memory DataFrames will return 0.

Arguments:

  • df - The Spark DataFrame to analyze

Returns:

The number of distinct tables found in the execution plan. Returns 0 if the plan cannot be retrieved or contains no table references.

is_dlt_pipeline

def is_dlt_pipeline(spark: SparkSession) -> bool

Determine if the current Spark session is running within a Databricks Delta Live Tables (DLT) pipeline.

Arguments:

  • spark - The SparkSession to check

Returns:

True if running in a DLT pipeline, False otherwise