Reconcile Configuration Reference

This page covers the full configuration schema for Lakebridge Reconcile. For setup steps and prerequisites, see the Reconcile Guide.

Report Types

report type	description	key outputs
`schema`	Reconcile schema of source and target — validate that data types are the same or compatible	`schema_comparison`, `schema_difference`
`row`	Reconcile data at row level (hash value comparison). Use when there are no join columns.	`missing_in_src`, `missing_in_tgt`
`data`	Reconcile data at row and column level using `join_columns` to identify per-row, per-column mismatches	`mismatch_data`, `missing_in_src`, `missing_in_tgt`, `threshold_mismatch`, `mismatch_columns`
`all`	Combination of `data` + `schema`	All outputs above

See Reconcile Data Flow Examples for visualizations of each report type.

Config File Naming

recon_config_<DATA_SOURCE>_<UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG>_<REPORT_TYPE>.json

Place the file in .lakebridge/ in your Databricks workspace home folder.

The filename must match the case of <UNITY_CATALOG_CONNECTION_NAME_OR_CATALOG> exactly. For example, if the source connection is conn-oracle-prod, the filename is recon_config_oracle_conn-oracle-prod_data.json.

Examples by source:

Snowflake
Oracle
MS SQL Server (incl. Synapse)
Teradata
Databricks

recon_config_snowflake_sample_data_all.json
source:
    dialect: snowflake
    catalog: sample_data
    schema: default
    uc_connection_name: example_connection_snowflake
target:
    catalog: migrated
    schema: migrated
report_type: all

recon_config_oracle_orc_data.json
source:
    dialect: oracle
    uc_connection_name: example_connection_oracle
target:
    catalog: migrated
    schema: migrated
report_type: data

recon_config_tsql_silver_data.json
source:
    dialect: mssql
    uc_connection_name: example_connection_mssql
target:
    catalog: migrated
    schema: migrated
report_type: data

recon_config_teradata_teradata_sandbox_all.json
source:
    dialect: teradata
    catalog: DBC
    schema: lakebridge_test
    uc_connection_name: teradata_sandbox
target:
    catalog: migrated
    schema: migrated
report_type: all

Required: install a hash UDF on the Teradata source

Teradata does not provide a portable cryptographic hash function in pure SQL, so you must install your own hash UDF on the Teradata source before running any row, data, or all report (the schema report type does not need it). Any deterministic hash (SHA-256, MD5, etc.) works as long as the source and target sides produce the same string for the same input.

You must also tell Lakebridge how to call the UDF by setting hash_expression_overrides.source on the recon config (see Hash Expression).

Refer to the Teradata documentation for how to create and register UDFs on your release and platform. For a real-world example, Teradata publishes an MD5 message-digest UDF you can install as-is or adapt: MD5 Message Digest UDF.

recon_config_databricks_hms_schema.json
source:
    dialect: databricks
    catalog: hive_metastore
    schema: hr
target:
    catalog: migrated
    schema: migrated
  ...
report_type: data

TABLE Config Schema

Python
JSON

@dataclass
class Table:
    source_name: str
    target_name: str
    aggregates: list[Aggregate] | None = None
    join_columns: list[str] | None = None
    jdbc_reader_options: JdbcReaderOptions | None = None
    select_columns: list[str] | None = None
    drop_columns: list[str] | None = None
    column_mapping: list[ColumnMapping] | None = None
    transformations: list[Transformation] | None = None
    column_thresholds: list[ColumnThresholds] | None = None
    filters: Filters | None = None
    table_thresholds: list[TableThresholds] | None = None

{
    "source_name": "[SOURCE_NAME]",
    "target_name": "[TARGET_NAME]",
    "aggregates": null,
    "join_columns": ["COLUMN_NAME_1", "COLUMN_NAME_2"],
    "jdbc_reader_options": null,
    "select_columns": null,
    "drop_columns": null,
    "column_mapping": null,
    "transformation": null,
    "column_thresholds": null,
    "filters": null,
    "table_thresholds": null
}

Configuration Parameters

Parameter	Type	Required	Description
`source_name`	string	Yes	Source table name
`target_name`	string	Yes	Target table name
`join_columns`	list[string]	No	Primary key columns for row-level joins
`aggregates`	list[Aggregate]	No	Aggregate reconciliation rules (see Aggregates)
`jdbc_reader_options`	object	No	JDBC read parallelization (see JDBC Reader Options)
`select_columns`	list[string]	No	Columns to include in reconciliation
`drop_columns`	list[string]	No	Columns to exclude from reconciliation
`column_mapping`	list[ColumnMapping]	No	Source-to-target column name mapping
`transformations`	list[Transformation]	No	Column-level SQL transformation expressions
`column_thresholds`	list[ColumnThresholds]	No	Acceptable variance per column
`table_thresholds`	list[TableThresholds]	No	Acceptable mismatch rate at table level
`filters`	Filters	No	WHERE-clause filters for source and/or target

JDBC Reader Options

Use JDBC reader options to parallelize reads from large tables.

Python
JSON

@dataclass
class JdbcReaderOptions:
    num_partitions: int
    partition_column: str
    lower_bound: str
    upper_bound: str
    fetchsize: int = 0

"jdbc_reader_options": {
    "num_partitions": 10,
    "partition_column": "id",
    "lower_bound": "1",
    "upper_bound": "100000",
    "fetchsize": 0
}

Field	Type	Required	Description
`num_partitions`	int	Yes	Number of Spark partitions for parallel reads
`partition_column`	string	Yes	Column used for partitioning (choose high-cardinality column for composite keys)
`lower_bound`	string	Yes	Minimum value for partitioning
`upper_bound`	string	Yes	Maximum value for partitioning
`fetchsize`	int	No (default: 100)	Rows per JDBC round-trip

Column Mapping

Map source column names to target column names when they differ.

Python
JSON

@dataclass
class ColumnMapping:
    source_name: str
    target_name: str

"column_mapping": [
  {
    "source_name": "dept_id",
    "target_name": "department_id"
  }
]

Drop Columns

Exclude specific columns from reconciliation (for data or all report types).

"drop_columns": ["comment", "audit_timestamp"]

Transformations

Apply SQL expressions to source or target columns before comparison. Use this when data types or formats differ between systems.

Python
JSON

@dataclass
class Transformation:
    column_name: str
    source: str
    target: str | None = None

"transformations": [
  {
    "column_name": "unit_price",
    "source": "coalesce(cast(cast(unit_price as decimal(38,10)) as string), '_null_recon_')",
    "target": "coalesce(cast(format_number(cast(unit_price as decimal(38, 10)), 10) as string), '_null_recon_')"
  }
]

Handling Nulls

When you provide a transformation expression, Reconcile uses it as-is. You must handle nulls explicitly in the expression. Use coalesce(..., '_null_recon_') to avoid null mismatches.

Timestamp Columns

Convert timestamps to Unix epoch strings for cross-platform comparison:

Transformation(
    column_name='UPDATE_TIMESTAMP',
    source="coalesce(cast(EXTRACT(epoch_millisecond FROM UPDATE_TIMESTAMP) as string), '_null_recon_')",
    target="coalesce(cast(unix_millis(UPDATE_TIMESTAMP) as string), '_null_recon_')"
)

Hash Expression

Override the row-hash function with your own SQL. Each value is raw SQL containing a single {} placeholder that Lakebridge replaces with the concatenated hash input.

source is required when the block is set: the source-side expression Lakebridge will emit on the source dialect.
target defaults to sha2({}, 256) (the target is always Databricks); override it if you use a non-SHA-256 hash.

The block itself is optional for most dialects, omit it to use the source dialect's default. For Teradata sources the block is mandatory, since Teradata has no out-of-the-box cryptographic hash.

Python
YAML

@dataclass
class HashExpressionOverrides:
    source: str
    target: str = "sha2({}, 256)"

hash_expression_overrides:
  source: "my_sha256({})"
  target: "sha2({}, 256)"

Same hash on both sides

The source and target expressions must produce the same string for the same logical row — that is what the comparison query joins on. Lakebridge wraps both sides with LOWER(...) automatically, so case alone won't cause mismatches; the bytes produced by the hash itself must match across systems.

Column Thresholds

Allow a bounded variance between source and target column values.

"column_thresholds": [
  {
    "column_name": "price",
    "lower_bound": "-5%",
    "upper_bound": "5%",
    "type": "float"
  }
]

Field	Type	Required	Description
`column_name`	string	Yes	Column to apply the threshold to
`lower_bound`	string	Yes	Lower bound of acceptable difference
`upper_bound`	string	Yes	Upper bound of acceptable difference
`type`	string	Yes	Column type (supports SQLGlot `DataType.NUMERIC_TYPES` and `DataType.TEMPORAL_TYPES`)

Table Thresholds

Allow a bounded mismatch rate at the table level.

"table_thresholds": [
  {
    "lower_bound": "0%",
    "upper_bound": "5%",
    "model": "mismatch"
  }
]

Bounds must be non-negative, and lower bound must not exceed upper bound. Only "mismatch" is supported for model.

Filters

Apply WHERE-clause filters to source and/or target before reconciliation.

"filters": {
  "source": "lower(dept_name) = 'finance'",
  "target": "lower(department_name) = 'finance'"
}

note

Filters must use dialect-specific SQL expressions — they are applied directly without any transformation by Reconcile.

Aggregates Reconciliation

Reconcile aggregate metrics (MIN, MAX, COUNT, SUM, AVG, etc.) between source and target instead of comparing raw rows.

Supported Functions

MIN, MAX, COUNT, SUM, AVG, MEAN, MODE, STDDEV, VARIANCE, MEDIAN

Aggregate Config

Python
JSON

@dataclass
class Aggregate:
    agg_columns: list[str]
    type: str
    group_by_columns: list[str] | None = None

{
    "type": "MIN",
    "agg_columns": ["discount"],
    "group_by_columns": ["p_id"]
}

note

select_columns and drop_columns are ignored for aggregate reconciliation. column_mapping, transformations, jdbc_reader_options, and filters are applied.

Complete Examples

Standard Reconcile Config

{
  "tables": [
    {
      "source_name": "product_prod",
      "target_name": "product",
      "jdbc_reader_options": {
        "num_partitions": 10,
        "partition_column": "p_id",
        "lower_bound": "0",
        "upper_bound": "10000000"
      },
      "join_columns": ["p_id"],
      "drop_columns": ["comment"],
      "column_mapping": [
        { "source_name": "p_id", "target_name": "product_id" },
        { "source_name": "p_name", "target_name": "product_name" }
      ],
      "transformations": [
        {
          "column_name": "creation_date",
          "source": "creation_date",
          "target": "to_date(creation_date,'yyyy-mm-dd')"
        }
      ],
      "column_thresholds": [
        { "column_name": "price", "lower_bound": "-50", "upper_bound": "50", "type": "float" }
      ],
      "table_thresholds": [
        { "lower_bound": "0%", "upper_bound": "5%", "model": "mismatch" }
      ],
      "filters": {
        "source": "p_id > 0",
        "target": "product_id > 0"
      }
    }
  ]
}

Aggregates Reconcile Config

{
  "tables": [
    {
      "source_name": "product_prod",
      "target_name": "product",
      "join_columns": ["p_id"],
      "aggregates": [
        { "type": "MIN", "agg_columns": ["discount"], "group_by_columns": ["p_id"] },
        { "type": "AVG", "agg_columns": ["discount"], "group_by_columns": ["p_id"] },
        { "type": "MAX", "agg_columns": ["p_id"], "group_by_columns": ["creation_date"] },
        { "type": "SUM", "agg_columns": ["p_id"] }
      ],
      "jdbc_reader_options": {
        "num_partitions": 10,
        "partition_column": "p_id",
        "lower_bound": "0",
        "upper_bound": "10000000"
      }
    }
  ]
}

Key Considerations

Column names are always converted to lowercase.
Case insensitivity and collation are not currently supported.
Always reference source column names in all configs, except transformations and filters — those use dialect-specific SQL applied directly.
When no user transformation is provided, Reconcile applies default transformations based on the source data type.

Report Types​

Config File Naming​

TABLE Config Schema​

Configuration Parameters​

JDBC Reader Options​

Column Mapping​

Drop Columns​

Transformations​

Hash Expression​

Column Thresholds​

Table Thresholds​

Filters​

Aggregates Reconciliation​

Supported Functions​

Aggregate Config​

Complete Examples​

Standard Reconcile Config​

Aggregates Reconcile Config​