Skip to main content

Reconcile Configuration Reference

This page covers the full configuration schema for Lakebridge Reconcile. For setup steps and prerequisites, see the Reconcile Guide.


Report Types

report typedescriptionkey outputs
schemaReconcile schema of source and target — validate that data types are the same or compatibleschema_comparison, schema_difference
rowReconcile data at row level (hash value comparison). Use when there are no join columns.missing_in_src, missing_in_tgt
dataReconcile data at row and column level using join_columns to identify per-row, per-column mismatchesmismatch_data, missing_in_src, missing_in_tgt, threshold_mismatch, mismatch_columns
allCombination of data + schemaAll outputs above

See Reconcile Data Flow Examples for visualizations of each report type.


Config File Naming

recon_config_<DATA_SOURCE>_<UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG>_<REPORT_TYPE>.json

Place the file in .lakebridge/ in your Databricks workspace home folder.

The filename must match the case of <UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG> exactly. For example, if the source connection is conn-oracle-prod, the filename is recon_config_oracle_conn-oracle-prod_data.json.

Examples by source:

recon_config_snowflake_sample_data_all.json
source:
dialect: snowflake
catalog: sample_data
schema: default
uc_connection_name: example_connection_snowflake
target:
catalog: migrated
schema: migrated
report_type: all

TABLE Config Schema

@dataclass
class Table:
source_name: str
target_name: str
aggregates: list[Aggregate] | None = None
join_columns: list[str] | None = None
jdbc_reader_options: JdbcReaderOptions | None = None
select_columns: list[str] | None = None
drop_columns: list[str] | None = None
column_mapping: list[ColumnMapping] | None = None
transformations: list[Transformation] | None = None
column_thresholds: list[ColumnThresholds] | None = None
filters: Filters | None = None
table_thresholds: list[TableThresholds] | None = None

Configuration Parameters

ParameterTypeRequiredDescription
source_namestringYesSource table name
target_namestringYesTarget table name
join_columnslist[string]NoPrimary key columns for row-level joins
aggregateslist[Aggregate]NoAggregate reconciliation rules (see Aggregates)
jdbc_reader_optionsobjectNoJDBC read parallelization (see JDBC Reader Options)
select_columnslist[string]NoColumns to include in reconciliation
drop_columnslist[string]NoColumns to exclude from reconciliation
column_mappinglist[ColumnMapping]NoSource-to-target column name mapping
transformationslist[Transformation]NoColumn-level SQL transformation expressions
column_thresholdslist[ColumnThresholds]NoAcceptable variance per column
table_thresholdslist[TableThresholds]NoAcceptable mismatch rate at table level
filtersFiltersNoWHERE-clause filters for source and/or target

JDBC Reader Options

Use JDBC reader options to parallelize reads from large tables.

@dataclass
class JdbcReaderOptions:
num_partitions: int
partition_column: str
lower_bound: str
upper_bound: str
fetchsize: int = 0
FieldTypeRequiredDescription
num_partitionsintYesNumber of Spark partitions for parallel reads
partition_columnstringYesColumn used for partitioning (choose high-cardinality column for composite keys)
lower_boundstringYesMinimum value for partitioning
upper_boundstringYesMaximum value for partitioning
fetchsizeintNo (default: 100)Rows per JDBC round-trip

Column Mapping

Map source column names to target column names when they differ.

@dataclass
class ColumnMapping:
source_name: str
target_name: str

Drop Columns

Exclude specific columns from reconciliation (for data or all report types).

"drop_columns": ["comment", "audit_timestamp"]

Transformations

Apply SQL expressions to source or target columns before comparison. Use this when data types or formats differ between systems.

@dataclass
class Transformation:
column_name: str
source: str
target: str | None = None
Handling Nulls

When you provide a transformation expression, Reconcile uses it as-is. You must handle nulls explicitly in the expression. Use coalesce(..., '_null_recon_') to avoid null mismatches.

Timestamp Columns

Convert timestamps to Unix epoch strings for cross-platform comparison:

Transformation(
column_name='UPDATE_TIMESTAMP',
source="coalesce(cast(EXTRACT(epoch_millisecond FROM UPDATE_TIMESTAMP) as string), '_null_recon_')",
target="coalesce(cast(unix_millis(UPDATE_TIMESTAMP) as string), '_null_recon_')"
)

Column Thresholds

Allow a bounded variance between source and target column values.

"column_thresholds": [
{
"column_name": "price",
"lower_bound": "-5%",
"upper_bound": "5%",
"type": "float"
}
]
FieldTypeRequiredDescription
column_namestringYesColumn to apply the threshold to
lower_boundstringYesLower bound of acceptable difference
upper_boundstringYesUpper bound of acceptable difference
typestringYesColumn type (supports SQLGlot DataType.NUMERIC_TYPES and DataType.TEMPORAL_TYPES)

Table Thresholds

Allow a bounded mismatch rate at the table level.

"table_thresholds": [
{
"lower_bound": "0%",
"upper_bound": "5%",
"model": "mismatch"
}
]

Bounds must be non-negative, and lower bound must not exceed upper bound. Only "mismatch" is supported for model.


Filters

Apply WHERE-clause filters to source and/or target before reconciliation.

"filters": {
"source": "lower(dept_name) = 'finance'",
"target": "lower(department_name) = 'finance'"
}
note

Filters must use dialect-specific SQL expressions — they are applied directly without any transformation by Reconcile.


Aggregates Reconciliation

Reconcile aggregate metrics (MIN, MAX, COUNT, SUM, AVG, etc.) between source and target instead of comparing raw rows.

Supported Functions

MIN, MAX, COUNT, SUM, AVG, MEAN, MODE, STDDEV, VARIANCE, MEDIAN

Aggregate Config

@dataclass
class Aggregate:
agg_columns: list[str]
type: str
group_by_columns: list[str] | None = None
note

select_columns and drop_columns are ignored for aggregate reconciliation. column_mapping, transformations, jdbc_reader_options, and filters are applied.


Complete Examples

Standard Reconcile Config

{
"tables": [
{
"source_name": "product_prod",
"target_name": "product",
"jdbc_reader_options": {
"num_partitions": 10,
"partition_column": "p_id",
"lower_bound": "0",
"upper_bound": "10000000"
},
"join_columns": ["p_id"],
"drop_columns": ["comment"],
"column_mapping": [
{ "source_name": "p_id", "target_name": "product_id" },
{ "source_name": "p_name", "target_name": "product_name" }
],
"transformations": [
{
"column_name": "creation_date",
"source": "creation_date",
"target": "to_date(creation_date,'yyyy-mm-dd')"
}
],
"column_thresholds": [
{ "column_name": "price", "lower_bound": "-50", "upper_bound": "50", "type": "float" }
],
"table_thresholds": [
{ "lower_bound": "0%", "upper_bound": "5%", "model": "mismatch" }
],
"filters": {
"source": "p_id > 0",
"target": "product_id > 0"
}
}
]
}

Aggregates Reconcile Config

{
"tables": [
{
"source_name": "product_prod",
"target_name": "product",
"join_columns": ["p_id"],
"aggregates": [
{ "type": "MIN", "agg_columns": ["discount"], "group_by_columns": ["p_id"] },
{ "type": "AVG", "agg_columns": ["discount"], "group_by_columns": ["p_id"] },
{ "type": "MAX", "agg_columns": ["p_id"], "group_by_columns": ["creation_date"] },
{ "type": "SUM", "agg_columns": ["p_id"] }
],
"jdbc_reader_options": {
"num_partitions": 10,
"partition_column": "p_id",
"lower_bound": "0",
"upper_bound": "10000000"
}
}
]
}

Key Considerations
  1. Column names are always converted to lowercase.
  2. Case insensitivity and collation are not currently supported.
  3. Always reference source column names in all configs, except transformations and filters — those use dialect-specific SQL applied directly.
  4. When no user transformation is provided, Reconcile applies default transformations based on the source data type.