BigQuery Profiler Details

Prerequisites
Configure Connection to BigQuery
Execute the profiler

Prerequisites

1. Download

gcloud CLI (https://cloud.google.com/sdk/docs/install)
No driver install required for the BigQuery profiler

2. Authenticate to GCP

The profiler authenticates with Application Default Credentials (ADC). The simplest setup is:

gcloud auth application-default login

For more options, please review the docs.

3. Required IAM permissions on the target BigQuery project(s)

The executing identity needs the following permissions. roles/bigquery.admin covers all of them; for least-privilege deployments grant the discrete permissions below.

bigquery.jobs.create
bigquery.jobs.list
bigquery.datasets.get
bigquery.datasets.getIamPolicy
bigquery.tables.get
bigquery.tables.getData
bigquery.routines.list
bigquery.readSessions.create
bigquery.reservations.list                  # Editions customers only
bigquery.reservationAssignments.list        # Editions customers only
bigquery.capacityCommitments.list           # Editions customers only

4. INFORMATION_SCHEMA / region-scoped views accessed

Profiler aspect	Region-scoped view	Required permission
Job history (workload, timeline, fulfillment, jobs-by-reservation)	`region-{region}.INFORMATION_SCHEMA.JOBS_BY_PROJECT`, `JOBS_TIMELINE`	`bigquery.jobs.list` (job-owner sees own jobs; admin sees all)
Table storage	`region-{region}.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT`	`bigquery.tables.get` + `bigquery.tables.getData`
Streaming inserts	`region-{region}.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT`	`bigquery.jobs.list`
Write API	`region-{region}.INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_PROJECT`	`bigquery.jobs.list`
Reservation timeline	`region-{region}.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE`, `RESERVATION_CHANGES`	`bigquery.reservations.list`
Capacity commitments	`region-{region}.INFORMATION_SCHEMA.CAPACITY_COMMITMENT_CHANGES_BY_PROJECT`	`bigquery.capacityCommitments.list` (admin project only)

Attention:

Skip the reservation/commitment permissions if the customer is on Pay-as-you-go pricing rather than BigQuery Editions. The corresponding SQL files will return empty results — harmless. You can also set exclude_reservations_data: true during configuration to skip those queries entirely (saves BigQuery slot time; the corresponding DuckDB tables are still created as empty stubs so the output schema stays consistent).

Attention:

Some queries (commitments, capacity commitments) only return data when run against the customer's administrative project — the one that owns the BigQuery Edition slot reservations. For non-admin projects they return empty.

Configure Connection to BigQuery

databricks labs lakebridge configure-database-profiler

Please select the source system you want to configure
[0] bigquery
[1] mssql
[2] synapse
Enter a number between 0 and 2: 0

(local | env)
local means values are read as plain text
env means values are read from environment variables, and fall back to plain text if not variable is not found

Enter secret vault type (local | env)
[0] env
[1] local
Enter a number between 0 and 1: 1
Please provide BigQuery connection settings:

Enter BigQuery project and region pairs (Format: comma-separated project.region. Example: my-proj-a.us, my-proj-b.eu-west-1): customer-prod-1.us, customer-admin.us
Enter lookback window in days to profile (default: 180):
Enter max parallel SQLs per (project, region) iteration (default: 8):
Please configure profiler settings:
Exclude reservations and commitments data? (default: no):
Exclude streaming and write API summary? (default: no):

The resulting ~/.databricks/labs/lakebridge/.credentials.yml looks like:

secret_vault_type: local
secret_vault_name: null
bigquery:
  pairs:
    - project: customer-prod-1
      region: us
    - project: customer-admin
      region: us
  profiler:
    profiling_window_days: 180
    max_parallel_sqls: 8
    redact_query_text: true
    exclude_reservations_data: false
    exclude_streaming_metrics: false

Execute the profiler

databricks labs lakebridge execute-database-profiler --source-tech bigquery

The pipeline runs bq_metadata_extract — connects to BigQuery and runs 16 vendored SQL queries serially across the configured (project, region) pairs and in parallel within each iteration (max_parallel_sqls workers). Results land in ~/.databricks/labs/lakebridge_profilers/bigquery_assessment/profiler_extract.db (DuckDB).

The final DuckDB contains 12 analysis tables by default (one per analysis_type).

Setting exclude_reservations_data: true skips the 6 reservation/commitment BigQuery queries — the corresponding DuckDB tables are still created (empty) to keep the output schema consistent.

Back to Configure Profiler

Prerequisites​

1. Download​

2. Authenticate to GCP​

3. Required IAM permissions on the target BigQuery project(s)​

4. INFORMATION_SCHEMA / region-scoped views accessed​

Configure Connection to BigQuery​

Execute the profiler​