Skip to main content

Configuration

ImpulseConfig configures everything about a report: the silver-layer input tables, the gold-layer output location, container-level filters, the query-engine solver, incremental processing, and which container columns get surfaced into the gold-layer measurement dimension. Configuration is defined as JSON (or an equivalent Python dictionary) and validated using Pydantic models. The canonical schema lives in src/impulse_reporting/config/config_parser.py.

Quick example

{
"source": {
"container_metrics_table": "my_catalog.silver.container_metrics",
"channel_metrics_table": "my_catalog.silver.channel_metrics",
"channels_uri": "my_catalog.silver.channels",
"container_tags_table": "my_catalog.silver.container_tags",
"channel_tags_table": "my_catalog.silver.channel_tags"
},
"unity_sink": {
"catalog": "my_catalog",
"schema": "gold",
"table_prefix": "my_report"
},
"query_engine": {
"solver": "DefaultSolver",
"data_type": "RAW"
},
"container_filters": {
"tag_filters": [
[
{ "tag_name": "uut_id", "comparator": "==", "value": "ABC123", "cast_type": "string" }
]
],
"metric_filters": [
[
{ "column_name": "start_dt", "comparator": ">=", "value": "2025-04-27T05:20:54.000Z", "value_type": "timestamp" }
]
]
},
"measurement_dimensions": ["container_id", "vehicle_key", "start_ts", "stop_ts"]
}

A configuration is passed to Report either as a Python dict (config=...) or as a JSON file path (config_path=...). Sinkless mode is also supported — see Sinkless reports.


source

Maps the silver-layer input tables.

FieldTypeRequiredDescription
container_metrics_tablestrYesFull Unity Catalog path. Container metadata (timestamps, duration).
channel_metrics_tablestrYesFull Unity Catalog path. Channel-level statistics.
channels_uristrYesFull Unity Catalog path. Time-series sample data.
container_tags_tablestrNoFull Unity Catalog path. Container EAV tags.
channel_tags_tablestrNoFull Unity Catalog path. Channel EAV tags.
channel_mapping_tablestrNoFull Unity Catalog path. Logical-to-physical channel alias table. Required when using QueryBuilder.channel_with_alias(). In reporting mode the resolved alias-to-physical-channel mapping is materialized to the gold-layer channel_mapping_resolution_dimension.
unit_conversion_tablestrNoFull Unity Catalog path. Per-unit-family conversion factors. When configured together with a channel_mapping_table whose rows carry source_unit / target_unit columns, aliased selectors auto-convert values from source to target unit during solve().

A container_tags_table is required to use tag-based container filters; a channel_tags_table is required to select channels by tag rather than by columns on channel_metrics.


unity_sink

Defines where gold-layer tables are written.

FieldTypeRequiredDescription
catalogstrYesTarget catalog name.
schemastrYesTarget schema name.
table_prefixstrYesPrefix for all generated table names.

Output tables are named {table_prefix}_{entity} (e.g. my_report_histogram_fact).

Sinkless reports

unity_sink is optional. When omitted, the report runs in sinkless mode: determine_report() still computes events, aggregations, and container dimensions and exposes them on the report object, but persist_results() becomes a no-op. Useful for ad-hoc analysis, notebooks, and tests where writing to Unity Catalog is not desired.


container_filters (optional)

Restricts the set of processed containers. Filters are expressed in disjunctive normal form (OR of ANDs): each inner list is AND-combined, the outer list is OR-combined.

Two independent filter families:

  • tag_filters — applied on container_tags_table (EAV key/value model).
  • metric_filters — applied on container_metrics_table (columnar model).
FieldTypeDefaultDescription
tag_filterslist[list[TagFilter]][]Tag-based filter groups (DNF).
metric_filterslist[list[MetricFilter]][]Metric-based filter groups (DNF).

TagFilter

FieldTypeRequiredDescription
tag_namestrYesTag key to filter on.
comparatorstrYesOne of ==, !=, >, >=, <, <=.
valueanyYesExpected value. Must match cast_type.
cast_typestrNostring (default), int, double, or timestamp (ISO-format string).

MetricFilter

FieldTypeRequiredDescription
column_namestrYesColumn on container_metrics_table to filter on (e.g. start_dt, stop_dt). When solver_config.container_metrics.column_name_mapping is set, this refers to the internal name (after renaming).
comparatorstrYesOne of ==, !=, >, >=, <, <=.
valueanyYesExpected value. Must match value_type when provided.
value_typestrNoWhen provided, validates/converts the value: string, int, double, timestamp.

query_engine (optional)

FieldTypeDefaultDescription
solverstr"DefaultSolver""DefaultSolver" adapts to the silver layer: it selects channels from a narrow EAV channel_tags table when source.channel_tags_table is set and otherwise from columns on channel_metrics; it filters containers via a narrow EAV container_tags table or, when source.container_tags_table is omitted, a wide-only container_metrics. "DeltaSolver" and "KeyValueStoreSolver" are deprecated aliases that resolve to DefaultSolver.
data_typestr"RLE""RLE" (intervals [tstart, tend)) or "RAW" (raw timestamps; converted to RLE before aggregation).
drop_implausible_databoolfalseWhen true, drops channels rows where is_plausible = false. Requires data_type = "RAW"; combining with "RLE" raises a validation error.
batch_sizeint500Maximum number of selectors solved per batch.
solver_configSolverConfignullPer-table column mappings, per-table equality filters, and project scoping. Set project_id to scope reads by project — it is applied to container_tags (if configured), container_metrics, and channel_mapping (if configured), so it works in both narrow EAV and wide-only data models. Omit it when you don't need project scoping. See Solver column mappings and filters.

If query_engine is omitted, the default is DefaultSolver with data_type = "RLE".


Solver column mappings and filters

The framework references columns by a fixed set of internal names (e.g. container_id, channel_id, tstart, tend, value). When your silver-layer tables use different physical names, declare the mapping in solver_config so the solver renames each table's columns at read time.

SolverConfig has one section per silver table. Each section is a TableConfig with two fields:

  • column_name_mapping (dict[str, str]): { "physical_column": "internal_column" }. The mapping is applied once, when the table is read. All downstream processing (filters, joins, aggregations) uses the internal names.
  • filters (dict[str, str]): equality filters applied after renaming. Keys are internal column names; values are literals to match. Useful for project/toolbox scoping where a single value should always be enforced.

Top-level fields on SolverConfig:

  • project_id (str, optional): Project scope. When set, the solver applies an equality filter on the project_id column (after column-name mapping) of every table it reads that carries one — container_tags (if configured), container_metrics, and channel_mapping (if configured). Omit it if you don't need project-level scoping; the solver does not require it.

Per-table sections (each a TableConfig):

SectionWhen it appliesTypical mappings
container_tagswhen container_tags_table is configuredentity_id → container_id, custom EAV key/value columns
container_metricsalwaysCustom container_id column, custom timestamp columns
channel_tagswhen channel_tags_table is configuredTag key/value column renames
channel_metricsalwaysCustom channel_id column, custom value/timestamp columns
channel_mappingwhen channel_mapping_table is configuredAlias-table column renames; priority column; optional join_keys for non-default alias-resolution composite keys
channelsalwaysRLE column renames (tstart/tend/value)
unit_conversionwhen unit_conversion_table is configuredUnit-conversion table column renames (unit, group_id, conversion_factor)

Internal column names that mappings can target:

Internal nameDescription
container_idContainer identifier
channel_idChannel identifier
tstart, tendSample interval start/end on the channels table (RLE)
start_ts, stop_tsMeasurement start/stop epoch timestamps on the container_metrics table — referenced by ContainerEvent to derive event-fact start/end
valueSample value (or attribute value on the EAV tag table)
keyAttribute key on the EAV container_tags table
priorityTie-breaker column on the channel_mapping table
project_idProject scoping column
parent_idParent/scope identifier
source_channelSource-channel identifier on the channel_mapping table
data_keyData-key identifier (default present on both channel_mapping and channel_metrics)
channel_aliasAlias identifier on the channel_mapping table
channel_nameChannel-name identifier on the channel_metrics table
source_unit, target_unitSource/target unit columns on the channel_mapping table
unitUnit name column on the unit_conversion table
group_idUnit-family identifier on the unit_conversion table
conversion_factorPer-unit factor on unit_conversion; also the per-channel factor name carried into the solve UDF
Feature support

DefaultSolver consumes every section of solver_config: per-table column_name_mapping, per-table filters, top-level project_id, and the channel_mapping / unit_conversion sections. Sections for tables you do not configure (e.g. channel_tags, channel_mapping) are simply unused.

Example: DefaultSolver with renamed columns and per-table filters

"query_engine": {
"solver": "DefaultSolver",
"solver_config": {
"project_id": "my_project",
"container_tags": {
"column_name_mapping": {"entity_id": "container_id"},
"filters": {"parent_id": "my_parent_id"}
},
"container_metrics": {
"column_name_mapping": {"start_dt": "tstart", "stop_dt": "tend"}
},
"channel_metrics": {
"column_name_mapping": {}
},
"channel_mapping": {
"column_name_mapping": {},
"filters": {"toolbox_id": "my_toolbox"}
},
"channels": {
"column_name_mapping": {}
}
}
}

Sections you don't customize can be omitted; defaults are an empty mapping and no filters.

Unit conversion (optional)

Set source.unit_conversion_table and extend channel_mapping with source_unit / target_unit columns to have aliased selectors auto-convert values from source to target unit during solve(). Direct selectors via query.channel(...) always return raw values, even on a channel that an aliased sibling converts — conversion is a property of the alias, not of the channel. See unit_conversion for the table schema.

"source": {
"container_metrics_table": "my_catalog.silver.container_metrics",
"channel_metrics_table": "my_catalog.silver.channel_metrics",
"channels_uri": "my_catalog.silver.channels",
"channel_mapping_table": "my_catalog.silver.channel_mapping",
"unit_conversion_table": "my_catalog.silver.unit_conversion"
},
"query_engine": {
"solver": "DefaultSolver",
"solver_config": {
"unit_conversion": {
"column_name_mapping": {}
}
}
}

Alias-resolution join keys (optional)

DefaultSolver.filter_aliased_channel_metrics joins channel_mapping to channel_metrics to resolve aliased selectors. The default composite key is (source_channel, channel_name) + (data_key, data_key). Override channel_mapping.join_keys to change the arity or column choice — for example, a single-column join when data_key is not part of the channel identity in your silver layout:

"solver_config": {
"channel_mapping": {
"join_keys": [
{"mapping_col": "source_channel", "metrics_col": "channel_name"}
]
}
}

Each mapping_col / metrics_col is an internal name (the name as the solver sees the column after column_name_mapping has been applied on the respective table). The two sides of a pair are independent, so the same column can carry different names on the two tables. For instance, a layout where the data-key column has different physical names on the two tables has two equivalent paths:

# Path 1 — rename both physical columns to the same internal name; the
# default join_keys then works unchanged.
"solver_config": {
"channel_mapping": {
"column_name_mapping": {"mapping_data_key": "data_key"}
},
"channel_metrics": {
"column_name_mapping": {"metrics_data_key": "data_key"}
}
}

# Path 2 — leave the physical names as-is and reference them directly.
"solver_config": {
"channel_mapping": {
"join_keys": [
{"mapping_col": "source_channel", "metrics_col": "channel_name"},
{"mapping_col": "mapping_data_key", "metrics_col": "metrics_data_key"}
]
}
}

query.channel(...) and query.channel_with_alias(...) kwargs are column references against the post-column_name_mapping schema. If you override join_keys (or skip renames) so that the solver sees a column under a non-default name, the same name must be used as the kwarg. Example: if join_keys references metrics_col: "my_chan_name" and the column is not renamed via column_name_mapping, call query.channel(my_chan_name=...). The internal-name properties on SolverConfig exist primarily to remove magic strings from the solver code; the user-facing contract is "kwarg name == column name as the solver sees it".

When to use what

  • solver_config.<table>.column_name_mapping — your silver-layer column is named differently from the framework's internal name (e.g. entity_id instead of container_id).
  • container_filters.tag_filters / metric_filters — choose which containers participate in this particular run (supports comparators, OR/AND combinations, and type casting). Refer to internal column names when solver_config rewrites them.

incremental (optional)

Incremental processing reuses results from prior runs for unchanged definitions and reprocesses only containers that are new or have been updated in silver. See the Report reference for mode-resolution rules and what counts as a definition change.

FieldTypeDefaultDescription
enabledboolfalseTurns incremental processing on.
silver_last_modified_columnstr"timestamp"Silver-side column used to detect container updates.
gold_last_modified_columnstr"_created_at"Gold-side column used to detect prior-run freshness.

measurement_dimensions (optional)

List of container_metrics column names to surface into the gold-layer measurement_dimension table. Names are matched after solver_config.container_metrics.column_name_mapping has been applied — i.e. these are the internal (post-mapping) column names, not the physical silver column names. Each name passes through to gold verbatim, so the configured name is also the gold column name.

Any column present in your post-mapping container_metrics DataFrame is a valid entry — there is no closed allow-list. Typical choices include container_id, uut_id, project, vehicle_key, file_name, file_path, start_ts, stop_ts, and environment, but any column your silver schema carries (under its internal name) is fair game.

container_id is part of the default list and is recommended for any real-world config: it is the upsert key used by incremental processing and the join key between the measurement dimension and the event-fact tables. If you override measurement_dimensions you take full ownership of what ends up in gold — the framework does not inject container_id for you. Omit it only if you know the consequences for downstream joins and incremental runs.

Default:

[
"container_id",
"start_ts",
"stop_ts"
]

If any listed column is not present in the post-mapping container_metrics DataFrame when the report runs, the run fails fast with a ValueError naming the missing columns.

Worked example: physical name differs from internal name

Suppose your silver container_metrics table has a physical column my_measurement_id (no container_id). Map it to the internal name in solver_config, then reference the internal name in measurement_dimensions:

{
"query_engine": {
"solver": "DefaultSolver",
"solver_config": {
"container_metrics": {
"column_name_mapping": { "my_measurement_id": "container_id" }
}
}
},
"measurement_dimensions": ["container_id", "start_ts", "stop_ts"]
}

The gold measurement_dimension table will have columns container_id, start_ts, stop_ts. Listing "my_measurement_id" in measurement_dimensions here would fail — by the time the framework selects the dimensions, the column has already been renamed to container_id.

Migration note (pre-0.1): earlier versions exposed a fixed enum that renamed two silver columns on the way to gold (projectproject_id, file_pathsource_file_path). The rename has been removed; if you previously listed "project_id" or "source_file_path", list "project" and "file_path" instead. The default list also shrank — if you relied on the old eight-column default, add the columns you want explicitly.