tempo - Time Series Utilities for Data Teams Using Databricks#

The purpose of this project is to make time series manipulation with Spark simpler. Operations covered under this package include AS OF joins, rolling statistics with user-specified window lengths, featurization of time series using lagged values, and Delta Lake optimization on time and partition fields.

Time Series on Spark & Photon with tempo is highly performant for historical analysis. We are simplifying all of the common functions to make development more easy.

Tempo is very easy to use:

from pyspark.sql.functions import *
phone_accel_df = spark.read.format("csv").option("header", "true").load("dbfs:/home/tempo/Phones_accelerometer").withColumn("event_ts", (col("Arrival_Time").cast("double")/1000).cast("timestamp")).withColumn("x", col("x").cast("double")).withColumn("y", col("y").cast("double")).withColumn("z", col("z").cast("double")).withColumn("event_ts_dbl", col("event_ts").cast("double"))
from tempo import *
phone_accel_tsdf = TSDF(phone_accel_df, ts_col="event_ts", partition_cols = ["User"])


Tempo can be installed with pip

$ pip install dbl-tempo

Alternatively, you can grab the latest source code from GitHub:

$ pip install -e git+https://github.com/databrickslabs/tempo.git#"egg=dbl-tempo&#subdirectory=python"


The User Guide is the place to go to learn how to use the library and accomplish common tasks.

The API Reference documentation provides API-level documentation.

Who uses tempo?#

tempo is one of the most popular packages on for spark based timeseries analysis. and is actively maintained by engineers & field experts with constant addition of new features ranging from time series pre-processing to time-series analytics & machine learning!


Tempo is made available under databricks License. For more details, see LICENSE.

We happily welcome contributions, please see Contributing for details.