Profiler Guide
The Profiler is currently an Experimental feature in Lakebridge. For any feedback and/or issues, feel free to reach out via Github issues.
Overview
The Lakebridge Profiler is designed to extract and analyze metadata from database systems, providing insights into your source environment. The profiler helps you understand system configurations, resource utilization, query patterns, and performance metrics to aid in migration planning.
Key capabilities:
- Database Metadata Extraction: Captures schema information, table structures, and object definitions
- Performance Analytics: Collects query execution metrics and resource utilization data
- Workload Analysis: Profiles active queries and identifies optimization opportunities
Each system will have different prerequisites either for connection, or metrics collected. Please refer to the details for each system following the links below.
Supported Source Systems
| Source Platform | Configuration Status |
|---|---|
| Azure Synapse | ✅ |
Configure Profiler
Before running the profiler, you need to configure the connection details for your source system.
Execute the following command to configure the profiler, which will prompt you to select the source system and provide connection details specific to that source:
databricks labs lakebridge configure-database-profiler
Execute Profiler
Once configured, run the profiler to extract metadata and performance metrics from your source system:
databricks labs lakebridge execute-database-profiler --help
output:
Profile the source system database
Usage:
databricks labs lakebridge execute-database-profiler [flags]
Flags:
-h, --help help for execute-database-profiler
--source-tech string (Optional) The technology/platform of the sources to Profile
Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
The profiler will:
- Connect to your source system using the configured credentials
- Execute the profiling pipeline to extract metadata and metrics
- Store the results in the configured output location
- Generate a summary report of the profiling execution
The profiler can be run multiple times to capture different time periods or updated configurations. Each execution will create a timestamped snapshot of your source environment.
Publish Profiler Summary Dashboard
Upload a summary of a profiler run as a dashboard to your local Databricks workspace using the create-profiler-dashboard.
Description
create-profiler-dashboard command converts a profiler extract file to a dashboard and deploys it to a Databricks workspace.
This helps users quickly visualize and explore profiler results without additional setup in a SQL or BI tool.
Note: This command is part of the experimental profiler workflow and is subject to change in future versions.
Syntax
databricks labs lakebridge create-profiler-dashboard \
--extract-file <path-to-local-extract-file> \
--source-tech <source-system-name> \
--uc-volume <uc-volume-path> \
[--catalog-name <uc-catalog-name>] \
[--schema-name <uc-schema-name>]
Options
--extract-file (example: --extract-file ./output/profiler_results.db )
Description:
Specifies the local file path to an extract file containing the profiler results.
This file is automatically output after the successful execution of a profiler run using
databricks labs lakebridge execute-database-profiler.
--source-tech (example: --source-tech synapse )
Description: Specifies the name of the source system technology that was profiled. This value is used to load the profiler dashboard template.
--uc-volume (example: --uc-volume /Volumes/lakebridge_profiler/profiler_runs)
Description: The name of the Unity Catalog (UC) volume where the extract file will be uploaded.
--catalog-name (example: --catalog-name lakebridge_profiler)
Description:
[OPTIONAL] The name of the catalog where the extract data will be uploaded to as Delta tables.
If not provided, the command uses the default catalog lakebridge_profiler.
--schema-name (example: --schema-name profiler_runs)
Description:
[OPTIONAL] The name of the schema where extract data will be uploaded as Delta tables.
If not provided, the command uses the default schema profiler_runs.
Example
The following example deploys a profiler summary dashboard for an Azure Synapse profiler run:
databricks labs lakebridge create-profiler-dashboard \
--extract-file ./output/profile_output.db \
--source-tech synapse \
--uc-volume /Volumes/lakebridge_profiler/profiler_runs \
--catalog-name lakebridge_profiler \
--schema-name profiler_runs
Result:
Profiler extract file was uploaded successfully.
Dashboard created at: https://<databricks-workspace-url>/sql/dashboards/<dashboard-id>
Output
When the command executes, the following actions will take place:
- The profiler extract file is uploaded to the specified UC volume.
- A Databricks job for ingesting the extract file is deployed to the Databricks workspace and immediately executed.
- The profiler extract results are converted to Delta tables in the workspace catalog and schema.
- A Databricks dashboard summarizing the profiler results is deployed to the workspace.
- A workspace URL for accessing the dashboard is returned.