CAQL has recently been augmented with two new forecasting methods:
Both methods are functionally very similar, and can be applied in similar situations. They can both be used to forecast linear or exponential trends, like disk usage or user growth. As we will see, parameters and output values are very similar. As the slope method is simple and more performant, this should be the default in these scenarios. These methods are not suitable for forecasting periodic trends like page views.
While the idea behind the forecasting methods is rather simple, the emitted output values can be non-obvious to interpret. This post should give you all the information needed to get you started.
The slope forecasting method fits a line through a window of historic data, and projects a forecast based on the data in that window. The line is set to match the most recent data point, `y[t]`, and a second pivot-point that is delayed by model_duration into the past, `y[t - model_duration]`. This line is then used to extrapolate the data into the future by an amount of time equal to the specified `forecast_duration`. The forecasted value can be read off the graph as indicated in Figure 1.
A mathematical formula for the forecasted value `F[t]` can be given as follows:
F[t] = y[t] + (y[t] - y[t - model_duration]) * forecast_duration / model_duration
As in the slope method, the regression method uses a line to forecast the data into the future.
Instead of only using two points for the forecast, the regression method uses a regression line
that fits all samples that were recorded during the most recent model_duration. Again, we use this line to extrapolate the data an amount of forecast_duration into the future and arrive at the forecasted value, cf. Figure 2.
By default, the regression line is updated once every time an amount of time equal to the specified model duration has passed. This is in contrast to the slope method, which gets updated with every new sample.
Slope- and Regression-forecasting offer the following parameters.
`forecast_duration` the target duration to forecast into the future.
`model_duration` is an optional second parameter, that specifies the time duration to base the model on. A large model duration gives a more stable estimate, but at the same time slows down the uptake of new trends. The model duration should reflect the amount of time you expect the trends to be stable. Usual settings are 10M - 2h.
By setting `model = "linear"`, you can use an exponential graph for the forecasting instead of a straight line. This can be used to forecast exponential trends, like user growth.
The optional `step` parameter allows you to set an explicit update interval for the model.
For the regression model, the minimal step duration is the model_duration. For alerting, you want to keep this parameter low (the default value). A high step value can be used in graphs to make the interpolated lines visible (cf. Figure 3).
Further details about parameters and invocation can be found in the CAQL Manual.