Profiler Guide
The Profiler is currently an Experimental feature in Lakebridge. For any feedback and/or issues, feel free to reach out via Github issues.
Overview
The Lakebridge Profiler is designed to extract and analyze metadata from database systems, providing insights into your source environment. The profiler helps you understand system configurations, resource utilization, query patterns, and performance metrics to aid in migration planning.
Key capabilities:
- Database Metadata Extraction: Captures schema information, table structures, and object definitions
- Performance Analytics: Collects query execution metrics and resource utilization data
- Workload Analysis: Profiles active queries and identifies optimization opportunities
Each system will have different prerequisites either for connection, or metrics collected. Please refer to the details for each system following the links below.
Supported Source Systems
| Source Platform | Configuration Status |
|---|---|
| Azure Synapse | ✅ |
| Microsoft SQL Server | ✅ |
Configure Profiler
Before running the profiler, you need to configure the connection details for your source system.
Execute the following command to configure the profiler, which will prompt you to select the source system and provide connection details specific to that source:
databricks labs lakebridge configure-database-profiler
Execute Profiler
Once configured, run the profiler to extract metadata and performance metrics from your source system:
databricks labs lakebridge execute-database-profiler --help
output:
Profile the source system database
Usage:
databricks labs lakebridge execute-database-profiler [flags]
Flags:
-h, --help help for execute-database-profiler
--source-tech string (Optional) The technology/platform of the sources to Profile
Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
The profiler will:
- Connect to your source system using the configured credentials
- Execute the profiling pipeline to extract metadata and metrics
- Store the results in the configured output location
- Generate a summary report of the profiling execution
The profiler can be run multiple times to capture different time periods or updated configurations. Each execution will create a timestamped snapshot of your source environment.
Publish Profiler Summary Dashboard
Visualize your profiler results as a Lakeview dashboard deployed directly to your Databricks workspace. See the full guide: Profiler Summary Dashboard.