Skip to main content

Data Model

Start with three tables

DefaultSolver needs only three tables: container_metrics, channel_metrics, and channels. The tag tables (container_tags, channel_tags) and the channel_mapping / unit_conversion tables are fully optional add-ons, used only when configured — see the Query Engine table requirements.

The rest of this page documents the full shape. Landing your data in it during ingest is the simplest path (see the Ingestion guide); if you can't reshape, a SolverConfig can remap column names, or you can implement a custom solver.

Impulse operates on Databricks Medallion Architecture.

Raw measurement files are ingested into the lakehouse in the bronze layer. These are then processed and transformed into a normalized Silver layer. Gold Layer contains the final analytics results in a star schema optimized for querying and reporting.

All layers are stored as Delta tables in Unity Catalog, which makes them easy to govern, secure, and queryable by various personas across the organization.


Silver Layer (Input)

Only three tables are requiredcontainer_metrics, channel_metrics, and channels. The two tag tables (container_tags, channel_tags) are fully optional: add them only when you want tag-based container filtering or EAV channel selection.

TableRequired?Purpose
container_metricsYesOne row per measurement container with timestamps, duration, and channel count.
channel_metricsYesPre-computed statistics per channel (min, max, mean, percentiles, sample count). Also carries channel-selection columns (e.g. channel_name) in the wide model.
channelsYesTime-series sample data, either as raw (timestamp, value) samples or as run-length-encoded intervals [tstart, tend).
container_tagsOptionalKey-value metadata tags for containers (e.g. vehicle_key, project_id).
channel_tagsOptionalKey-value metadata tags per channel (e.g. channel_name, brand, model).

Channels are selected either from an EAV channel_tags table (e.g. channel_name = "Engine RPM") or directly from columns on channel_metrics — in both cases by signal metadata rather than fixed column positions, so the same schema supports arbitrary signal sets across projects.

See the Silver Layer ER Diagram for table relationships. For background on the design, see the Databricks blog post on revolutionizing car measurement data storage and analysis.


Gold Layer (Output)

The Gold layer uses a star schema with fact and dimension tables. All table names are prefixed with a configurable table_prefix (e.g. my_report_histogram_fact).

Fact tables

TableGrainDescription
event_instance_factOne row per event instance per containerMaterialized time windows where an event condition holds.
histogram_factOne row per bin per container1D histogram bin values, duration-weighted.
histogram2d_factOne row per (x, y) bin per container2D histogram bin values, duration-weighted.
stats_aggregator_factOne row per signal per event instanceDescriptive statistics (min, max, mean, median).

Dimension tables

TableDescription
measurement_dimensionContainer metadata selected from container_metrics via config.
event_dimensionEvent definitions (name, TSAL expression, required channels).
histogram_dimensionHistogram metadata (bins, signal info, units).
histogram2d_dimension2D histogram metadata (axes, bins, signal info, units).
stats_aggregator_dimensionStatistics metadata (channel names, aggregation labels).

Join pattern

Fact and dimension tables are connected through three key columns:

  • container_id -- links all fact tables to measurement_dimension
  • event_id -- links event_instance_fact, histogram_fact, and histogram2d_fact to event_dimension
  • visual_id -- links each aggregation fact table to its corresponding dimension table

stats_aggregator_fact additionally joins to event_instance_fact via event_instance_id, enabling per-interval breakdowns.


Key Concepts

ConceptDefinitionTables
ContainerA single measurement recording (e.g. one test drive). Identified by container_id.container_metrics, container_tags
ChannelA time-series signal within a container (e.g. "Engine RPM"). Identified by (container_id, channel_id).channels, channel_metrics, channel_tags
EventA time window of interest, defined by a condition or spanning the full recording.event_dimension, event_instance_fact
AggregationA computation over channel data within event windows (histogram, 2D histogram, or statistics).*_fact, *_dimension