Reviews of Important Papers on Forecasting,
1985-1995 Reviews
Review of:

Nada R. Sanders (1992), ‘Accuracy of judgmental forecasts: A comparison’, Omega, 20, 353 – 364.

This important study compares the performance of statistical and judgmental extrapolation methods for a set of artificial time series. It provides an evaluation of the ability of naive judges to identify patterns in the data. By naive, I mean that the judges have no domain expertise.

The use of artificial data makes it easier to draw conclusions about the features of the time series that affect relative forecast accuracy of statistical and judgmental extrapolations. Sanders used ten artificial series to simulate monthly data. They included stationary, seasonal (additive), trend and seasonal, and step functions. In each case there was a low-noise and high-noise version, with the latter having a standard deviation that was five times as large as the low-noise version. Each series consisted of 48 months for calibration and forecasts were required for a 12-month holdout period.

The judgmental forecasts were provided by 38 business students enrolled in an elective undergraduate forecasting course. Each student provided forecasts for two of the series. They received the data in tables and graphs. One week following the generation of their statistical forecasts, they received statistically-based forecasts along with their initial judgmental forecasts and were asked to revise the statistical forecasts. For the statistical models, simple exponential smoothing (no trend) was used for the stationary series, and Winter’s exponential smoothing was used for the series that contained seasonality, while trended exponential smoothing was used for the non-seasonal series.

The MAPE was used as the forecast criterion. This was an unfortunate choice given the relatively small number of forecasts (12 different horizons per series, and about eight judgmental forecasts for each horizon for each series). The MdAPE (Median Absolute Percentage Error) would have been better, and perhaps the MdRAE (Median Relative Absolute Error) might have been better yet [Armstrong and Collopy (1992)].

Here are some conclusions:

  1. In general, judgmental forecasts were less accurate than those from statistical methods. This was true for all except the low-noise step function.
  2. Judgmental forecasters were more accurate than statistical methods in forecasting series with step functions, as long as the noise level was not high. On these low-noise series, the statistical methods were less accurate than the random walk (naive) forecast, while the judgmental forecasters were slightly better.
  3. Sanders concludes that ‘judgmental forecast revision [of statistical extrapolations] may have value for low-noise series.’ But rather than revisions of statistical forecasts, one might view this as another test of combining, because the judgmental and statistical forecasts were prepared independently. Of course, the combining was then done by judgment. (It might have been useful to have also examined a mechanical average of the two forecasts to provide a cleaner test of combining.) Note that while this procedure of obtaining independent judgmental and extrapolation forecasts is desirable, it is seldom done in business. So it might be useful to compare this approach with the traditional one of first looking at a statistical forecast then revising it. Sander’s approach is excellent for addressing such questions.
  4. For stationary series, simple exponential smoothing (without trend) was substantially more accurate than the random walk, and this superiority increased as the noise increased. While not surprising, it is reassuring to see this result. It has important implications for practice.

It seems premature to draw many conclusions about forecasting practice. The situation that Sanders is studying differs in some significant ways from the real world. First, the judges had no domain expertise. Second, the model remained constant from the historical to the forecast period. Third, the judges received well organized data. Fourth, the judges made independent judgmental forecasts prior to seeing any statistical forecasts. And fifth, there were no political considerations that might bias the forecasts. However, from the viewpoint of a systematic study of forecasting methods, these are advantages. The experimental control allows the research to isolate various factors. We can learn much from such studies using artificially constructed series.


Armstrong, J. Scott and Fred Collopy, 1992, Error measures for generalizing about forecasting methods: Empirical comparisons, International Journal of Forecasting, 8, 69-80.