This important study compares the performance of statistical and judgmental extrapolation methods for a set of artificial time series. It provides an evaluation of the ability of naive judges to identify patterns in the data. By naive, I mean that the judges have no domain expertise.
The use of artificial data makes it easier to draw conclusions about the features of the time series that affect relative forecast accuracy of statistical and judgmental extrapolations. Sanders used ten artificial series to simulate monthly data. They included stationary, seasonal (additive), trend and seasonal, and step functions. In each case there was a low-noise and high-noise version, with the latter having a standard deviation that was five times as large as the low-noise version. Each series consisted of 48 months for calibration and forecasts were required for a 12-month holdout period.
The judgmental forecasts were provided by 38 business students enrolled in an elective undergraduate forecasting course. Each student provided forecasts for two of the series. They received the data in tables and graphs. One week following the generation of their statistical forecasts, they received statistically-based forecasts along with their initial judgmental forecasts and were asked to revise the statistical forecasts. For the statistical models, simple exponential smoothing (no trend) was used for the stationary series, and Winters exponential smoothing was used for the series that contained seasonality, while trended exponential smoothing was used for the non-seasonal series.
The MAPE was used as the forecast criterion. This was an unfortunate choice given the relatively small number of forecasts (12 different horizons per series, and about eight judgmental forecasts for each horizon for each series). The MdAPE (Median Absolute Percentage Error) would have been better, and perhaps the MdRAE (Median Relative Absolute Error) might have been better yet [Armstrong and Collopy (1992)].
Here are some conclusions:
It seems premature to draw many conclusions about forecasting practice. The situation that Sanders is studying differs in some significant ways from the real world. First, the judges had no domain expertise. Second, the model remained constant from the historical to the forecast period. Third, the judges received well organized data. Fourth, the judges made independent judgmental forecasts prior to seeing any statistical forecasts. And fifth, there were no political considerations that might bias the forecasts. However, from the viewpoint of a systematic study of forecasting methods, these are advantages. The experimental control allows the research to isolate various factors. We can learn much from such studies using artificially constructed series.