Time series clustering‍

Blog post

Data & Cloud Services

Dr. Simon Raschke

2023

Improved aggregation during planning

Do you work in supply chain management and regularly face the challenge of having to estimate the purchasing behavior of your customers and therefore your warehouse or production capacities? You will no doubt have heard of forecasting methods such as NHiTS, DeepAR or ARIMA from the data science toolbox. These methods attempt to predict the coming period based on historical data and other external influencing factors. However, in order for your planning to work and for you to be able to cope with the often enormous amounts of data, items are often grouped and planned together based on product groups, for example.

However, the difficulty with such an approach is often that items and products do not sell similarly simply because they have similar master data. The question of whether a yellow shirt sells similarly to a black shirt is not answered by the fact that they are both shirts. This blog post deals with a possible approach to this problem.

Another tool from the field of data science is clustering. This refers to a variety of methods that attempt to identify similarities and differences between data points or time series and assign them to groups. Characteristics of the data that are not immediately visible to the human eye are used and combined for this purpose. The large amount of data can also be used with the help of clustering algorithms. At the end of such a clustering process, each data point has been assigned to a cluster and further insights can be gained based on the clustering results. Planning and forecasting can also be improved with the help of this new aggregation level, as items with similar sales behavior can now be viewed together.

Figure 1: Raw data of several time series.

We can take a look at the clustering process for time series using an example. Figure 1 shows some time series that look rather chaotic as raw data. It is not immediately clear how to proceed if you try to group these time series or extrapolate them into the future. The simplest approach would be to take each time series separately and carry out forecasting individually. The disadvantage of this method is the small amount of data on which such algorithms can be trained. It would be better to have several time series that could be used together for forecasting. This is where time series clustering comes into play.

In principle, two methods can be used for clustering time series.

1. clustering based on time series characteristics

One approach to time series clustering in the supply chain is to apply the KMeans algorithm to the extracted features of the time series. Various statistical properties such as average, standard deviation, trend and seasonal patterns are used to represent each time series by a set of numerical features.

Using KMeans for clustering allows us to combine similar time series and form clusters that exhibit certain behavioral patterns. This can be useful in inventory planning, for example, to identify similar products and develop an optimal inventory strategy for each cluster.

It is important to note that the choice of characteristics has a significant impact on the effectiveness of the clustering process. Therefore, it is advisable to consult domain experts to identify the relevant features that are specific to the supply chain problem.

2. dynamic time warping

Dynamic time warping is a technique used to find similar patterns in time series, even if they have different lengths and offsets. When applying dynamic time warping in time series clustering, the similarity between two time series is calculated by finding the closest distance matches between the points of the two time series. This allows us to recognize patterns that might be missed by conventional clustering algorithms.

In our example, Dynamic Time Warping was used in conjunction with the KMeans clustering algorithm to recognize clusters in the chaotic data. As we can see here, there are 3 fundamentally distinguishable progressions, which are shown here in different colors.

Figure 2: Raw data of several time series, colored according to their respective cluster affiliation.

It is no longer possible to make such a classification with the naked eye when dealing with the volumes of data that usually occur in planning departments. As we have now identified similar time series, patterns emerge in the respective clusters. The algorithm has recognized these for us and assigned them accordingly.

Figure 3: Raw data of several time series in gray, divided into the 3 clusters. The colored curves indicate the time series focus, the mean value, of the respective cluster.

For further planning purposes, we can now ignore which items or products are brought together in such a cluster. Perhaps a black shirt sells in a similar way to a gold-coloured bangle and without this method we would not have realized that these products with fundamentally different master data are so similar. The findings from this process can now be put to good use.

Time series clustering can therefore save us a lot of work in the planning process:

Compared to the planning of individual items, we can now concentrate on the planning of clusters and connect forecasting to this process.

Forecasting algorithms can be trained based on a large number of similar time series, which can significantly increase their accuracy.
We gain previously unknown insights into the product range and can recognize correlations that are not obvious and do not emerge directly from the master data of the articles.

If you are also having difficulties mastering the flood of data in your planning process, please contact us. We look forward to the exchange.

Blog post author

Dr. Simon Raschke

Senior Cloud Solutions Architect

celver AG

Dr. Simon Raschke is a Senior Cloud Solutions Architect at celver AG with a background in natural sciences. He is particularly interested in projects with complex data contexts in the context of modern cloud technologies. His focus is on generating meaningful, business-relevant information from data of different types.

Book appointment

Send email