Skip to main content

Profiler Guide

Attention:

The Profiler is currently an Experimental feature in Lakebridge. For any feedback and/or issues, feel free to reach out via Github issues.

Overview

The Lakebridge Profiler is designed to extract and analyze metadata from database systems, providing insights into your source environment. The profiler helps you understand system configurations, resource utilization, query patterns, and performance metrics to aid in migration planning.

Key capabilities:

  • Database Metadata Extraction: Captures schema information, table structures, and object definitions
  • Performance Analytics: Collects query execution metrics and resource utilization data
  • Workload Analysis: Profiles active queries and identifies optimization opportunities
Prerequisites

Each system will have different prerequisites either for connection, or metrics collected. Please refer to the details for each system following the links below.

Supported Source Systems

Source PlatformConfiguration Status
Azure Synapse
Microsoft SQL Server

Configure Profiler

Before running the profiler, you need to configure the connection details for your source system.

Execute the following command to configure the profiler, which will prompt you to select the source system and provide connection details specific to that source:

databricks labs lakebridge configure-database-profiler

Execute Profiler

Once configured, run the profiler to extract metadata and performance metrics from your source system:

databricks labs lakebridge execute-database-profiler --help

output:

Profile the source system database

Usage:
databricks labs lakebridge execute-database-profiler [flags]

Flags:
-h, --help help for execute-database-profiler
--source-tech string (Optional) The technology/platform of the sources to Profile

Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)

The profiler will:

  1. Connect to your source system using the configured credentials
  2. Execute the profiling pipeline to extract metadata and metrics
  3. Store the results in the configured output location
  4. Generate a summary report of the profiling execution
tip

The profiler can be run multiple times to capture different time periods or updated configurations. Each execution will create a timestamped snapshot of your source environment.

Publish Profiler Summary Dashboard

Visualize your profiler results as a Lakeview dashboard deployed directly to your Databricks workspace. See the full guide: Profiler Summary Dashboard.