A tool to predict the future

Newswise – Whether someone is trying to predict tomorrow’s weather, predict future stock prices, identify missed retail sales opportunities, or estimate a patient’s risk of developing a disease, they will need to probably interpreting time series data, which is a collection of observations recorded over time.

Making predictions using time series data typically requires multiple data processing steps and the use of complex machine learning algorithms, which have such a steep learning curve that they are not easily accessible. to non-experts.

To make these powerful tools more user-friendly, MIT researchers developed a system that directly integrates prediction functionality on top of an existing time-series database. Their streamlined interface, which they call tspDB (Time Series Prediction Database), does all the complex modeling behind the scenes so a non-expert can easily generate a prediction in just seconds.

The new system is more accurate and efficient than state-of-the-art deep learning methods when performing two tasks: predicting future values ​​and filling in missing data points.

One of the reasons for tspDB’s success is that it incorporates a novel time-series prediction algorithm, says Electrical and Computer Engineering (EECS) graduate student Abdullah Alomar, author of a recent research paper in which he and its co-authors describe the algorithm. This algorithm is particularly good at making predictions on multivariate time series data, which is data that has more than one time-dependent variable. In a weather database, for example, temperature, dew point, and cloud cover each depend on their past values.

The algorithm also estimates the volatility of a multivariate time series to provide the user with a level of confidence for their predictions.

“Even though time series data is getting more and more complex, this algorithm can effectively capture any time series structure. It looks like we have found the right target to look at the complexity of the time series data model” , says lead author Devavrat Shah, Andrew and Erna Viterbi Professor at EECS and a fellow of the Institute for Data, Systems and Society and the Information and Decision Systems Laboratory.

Join Alomar and Shah on the paper is lead author Anish Agrawal, a former EECS graduate student who is currently a postdoctoral fellow at the Simons Institute at the University of California, Berkeley. The research will be presented at the ACM SIGMETRICS conference.

Adapt a new algorithm

Shah and his collaborators have been working on the problem of interpreting time series data for years, adapting different algorithms and integrating them into tspDB when building the interface.

About four years ago, they discovered a particularly powerful classical algorithm, called singular spectral analysis (SSA), which imputes and predicts unique time series. Imputation is the process of replacing missing values ​​or correcting past values. While this algorithm required manual selection of parameters, the researchers suspected that it could allow their interface to make efficient predictions using time-series data. In Previous workthey removed this need to intervene manually for the algorithmic implementation.

The algorithm for unique time series transformed it into a matrix and used matrix estimation procedures. The main intellectual challenge was how to adapt it to use multiple time series. After a few years of struggle, they realized the answer was something very simple: “Stack” the matrices for each individual time series, treat it as one large matrix, and then apply the single time series algorithm to it. .

This naturally uses information across multiple time series – both time series and over time, which they describe in their new paper.

This recent post also discusses interesting alternatives, where instead of transforming the multivariate time series into a large matrix, it is treated as a three-dimensional tensor. A tensor is a multidimensional array, or grid, of numbers. This established a promising connection between the classical field of time series analysis and the growing field of tensor estimation, Alomar says.

“The variant of mSSA that we introduced actually captures all of this beautifully. So not only does it provide the most likely estimate, but also a time-varying confidence interval,” says Shah.

The simpler, the better

They tested the adapted mSSA against other state-of-the-art algorithms, including deep learning methods, on real-world time-series datasets with inputs drawn from the power grid, traffic patterns, and financial markets.

Their algorithm outperformed all others in imputation and it outperformed all but one other algorithm in predicting future values. The researchers also demonstrated that their modified version of mSSA can be applied to any type of time series data.

“One of the reasons I think it works so well is that the model captures a lot of time series dynamics, but at the end of the day it’s still a simple model. When you’re working with something simple like this, instead of a neural network that can easily overload data, you can actually perform better,” says Alomar.

mSSA’s impressive performance is what makes tspDB so efficient, Shah explains. Now, their goal is to make this algorithm accessible to everyone.

When a user installs tspDB on top of an existing database, they can execute a prediction query with just a few keystrokes in about 0.9 milliseconds, compared to 0.5 milliseconds for a standard search query. Confidence intervals are also designed to help non-experts make a more informed decision by incorporating the degree of forecast uncertainty into their decision-making.

For example, the system could allow a non-expert to predict future stock prices with high accuracy in just a few minutes, even if the time series data set contains missing values.

Now that the researchers have shown why mSSA works so well, they are targeting new algorithms that can be integrated into tspDB. One of these algorithms uses the same model to automatically enable change point detection, so if the user thinks their time series will change their behavior at some point, the system will automatically detect that change and incorporate it into their forecasts.

They also want to continue collecting feedback from current tspDB users to see how they can improve the functionality and usability of the system, Shah said.

“Our interest at the highest level is to make tspDB a success as a widely usable open source system. Time series data is very important, and it’s a nice concept to build prediction functionality directly into the database. It’s never been done before, so we want to make sure the world uses it,” he says.


Comments are closed.