tempo - Time Series Utilities for Data Teams Using Databricks¶
The purpose of this project is to make time series manipulation with Spark simpler. Operations covered under this package include AS OF joins, rolling statistics with user-specified window lengths, featurization of time series using lagged values, and Delta Lake optimization on time and partition fields.
Time Series on Spark & Photon with tempo is highly performant for historical analysis. We are simplifying all of the common functions to make development more easy.
Tempo is very easy to use:
from pyspark.sql.functions import *
phone_accel_df = spark.read.format("csv").option("header", "true").load("dbfs:/home/tempo/Phones_accelerometer").withColumn("event_ts", (col("Arrival_Time").cast("double")/1000).cast("timestamp")).withColumn("x", col("x").cast("double")).withColumn("y", col("y").cast("double")).withColumn("z", col("z").cast("double")).withColumn("event_ts_dbl", col("event_ts").cast("double"))
from tempo import *
phone_accel_tsdf = TSDF(phone_accel_df, ts_col="event_ts", partition_cols = ["User"])
display(phone_accel_tsdf)
Installing¶
Tempo can be installed with pip
$ pip install dbl-tempo
Alternatively, you can grab the latest source code from GitHub:
$ pip install -e git+https://github.com/databrickslabs/tempo.git#"egg=dbl-tempo&#subdirectory=python"
NOTE that the Scala version of Tempo is now deprecated and no longer in development.
Usage¶
The User Guide is the place to go to learn how to use the library and accomplish common tasks.
The API Reference documentation provides API-level documentation.
Who uses tempo?¶
tempo is one of the most popular packages on for spark based timeseries analysis. and is actively maintained by engineers & field experts with constant addition of new features ranging from time series pre-processing to time-series analytics & machine learning!
License¶
Tempo is made available under databricks
License. For more details, see
LICENSE.
Docs on Github Pages¶
Tempo’s sphinx documentation is hosted on github pages and has been integrated with the project’s github actions.
The documentation is updated and kept in line with the latest version that has been published on PyPI and as such there is no guarantee that features and functionality that you get from installing tempo directly from the github repo will be documented here.
For support on those functionalities please feel free to reach out to the Tempo Team.
Contributing¶
We happily welcome contributions, please see Contributing for details.