databricks.labs.dqx.telemetry
log_telemetry
def log_telemetry(ws: WorkspaceClient, key: str, value: str) -> None
Trace specific telemetry information in the Databricks workspace by setting user agent extra info.
Arguments:
ws- WorkspaceClientkey- telemetry key to logvalue- telemetry value to log
telemetry_logger
def telemetry_logger(key: str,
value: str,
workspace_client_attr: str = "ws") -> Callable
Decorator to log telemetry for method calls. By default, it expects the decorated method to have "ws" attribute for workspace client.
Usage: @telemetry_logger("telemetry_key", "telemetry_value") # Uses "ws" attribute for workspace client by default @telemetry_logger("telemetry_key", "telemetry_value", "my_ws_client") # Custom attribute
Arguments:
key- Telemetry key to logvalue- Telemetry value to logworkspace_client_attr- Name of the workspace client attribute on the class (defaults to "ws")
log_dataframe_telemetry
def log_dataframe_telemetry(ws: WorkspaceClient, spark: SparkSession,
df: DataFrame)
Log telemetry information about a Spark DataFrame to the Databricks workspace including:
- Number of input tables and non-table inputs
- Whether the DataFrame is streaming
- Whether running in a Delta Live Tables (DLT) pipeline
Arguments:
ws- WorkspaceClientspark- SparkSessiondf- DataFrame to analyze
Returns:
None
count_tables_in_spark_plan
def count_tables_in_spark_plan(df: DataFrame) -> int
Count the number of tables referenced in a DataFrame's Spark execution plan.
This function analyzes the Analyzed Logical Plan section of the Spark execution plan to identify table references (via SubqueryAlias nodes). File-based DataFrames and in-memory DataFrames will return 0.
Arguments:
df- The Spark DataFrame to analyze
Returns:
The number of distinct tables found in the execution plan. Returns 0 if the plan cannot be retrieved or contains no table references.
is_dlt_pipeline
def is_dlt_pipeline(spark: SparkSession) -> bool
Determine if the current Spark session is running within a Databricks Delta Live Tables (DLT) pipeline.
Arguments:
spark- The SparkSession to check
Returns:
True if running in a DLT pipeline, False otherwise