forecastingprinciples.com Reviews of Important Papers on Forecasting,
1985-1995 Reviews

Review of:

D. Bunn and G. Wright (1991), "Interaction of Judgmental and Statistical Forecasting Methods: Issues and Analysis", Management Science, 37 501-518.

[Review written with Fred Collopy]

Bunn and Wright review the extensive literature on the use of judgment in forecasting. They use a narrative review to examine 138 articles. They present their review using the following structure:

(1) Comparisons

(a) unadjusted or subjectively adjusted quantitative models?
(b) quantitative models or judgmental forecasts?

(2) Experiments

(a) experts or non-experts?
(b) laboratory or real world?
(c) subjects informed about the series?
(d) tasks repetitive?
(e) subjects received feedback?
(f) estimates used as targets, budgets, or forecasts?

(3) Methods

(a) best practice used?
        (i) well-specified quantitative model?
        (ii) defensible judgmental forecasts with an audit trail?
(b) individual or aggregate judgment?
(c) holistic or decompositional judgment?
(d) graphics used to support judgment?

As is evident from this structure, Bunn and Wright cover a broad range of issues in judgmental and statistical forecasting. We briefly summarize their conclusions about two important issues: how judgment can be integrated into models, and whether judgment is more accurate than statistical models of judgment under certain conditions.

Integrating judgment into quantitative models

In discussing the integration of judgment into quantitative models, Bunn and Wright identify four gateways: variable selection (where judgment seems to be useful), model specification (where there are conflicting beliefs among the schools of forecasting researchers), parameter estimation (where promising theoretical results have failed to improve practice), and data analysis (which remains heavily judgmental and poses challenges for researchers).

Judgment plays a role in determining which models to use, how to combine them, and how to adjust them. Bunn and Wright note that there has been little research on these issues. They identify two guidelines for combining: (1) use models based upon different assumptions or data, and (2) avoid models with positive correlations between forecast errors.

Experts versus models of experts

Bunn and Wright conclude that when experts use an explicit judgmental process to make real forecasts, the resulting forecasts are more accurate than those produced by models of these experts. They claim that this is because experts use additional information, particularly through the recognition of unusual events, which have been called ‘broken leg’ cues. This is in contrast to the early bootstrapping literature’s evidence that the model of the judge outperforms the judge. While we suspect that Bunn and Wright may be correct, we do not believe that they adequately support such a position. In concluding that experts have outperformed bootstrapping models ‘in real world studies,’ Bunn and Wright cite Johnson (1988), Blattburg and Hoch (1990), Chalos (1985), and Libby (1975). Johnson (1988) studied two prediction tasks: in one task the experts did not do as well as the bootstrapping model; in the other task the experts were slightly more accurate for situations having ‘broken leg’ cues, but this difference was not significant. Johnson’s conclusion about his study (personal communication) is that it did not show that bootstrapping was more accurate, although it was as accurate for situations with broken leg cues. Blattburg and Hoch (1990) did conclude that experts were more accurate than bootstrapping models, and Hoch (personal communication) believes that experts are often superior to their bootstrapping models. But Blattburg and Hoch (1990) did not report data on this issue. Chalos (198S) was poorly designed to address this issue because his quantitative model was inadequate and the judges used more information than was used by the quantitative model. Libby published a revised version of his study in the following year [Libby (1976)], and Goldberg’s (1976) re-analysis of these data showed that good forecasting practice produced a bootstrapping model that was superior for these data. Another benefit of the Bunn and Wright review, then, is that it helps to identify areas for further research. Research on the conditions under which experts are superior to bootstrapping should be of substantial value.

This difference in interpretation of the studies makes us wonder if the application of meta-analysis to this body of literature would have led to some different conclusions on other issues. Wanous, Sullivan and Malinak (1989) suggest that one benefit of applying meta-analysis in lieu of a traditional review is that it reduces subjectivity in the interpretation of the findings. Cooper and Rosenthal (1980) showed that meta-analysis yields better-supported conclusions than does a traditional review.

A comment on the study as the unit of analysis

Given that the study is the basic unit of analysis in review papers, we make a brief comment on references in Bunn and Wright. Eichorn and Yankauer (1987) concluded that published papers frequently contain incorrect references; 31 % of the references in their sample had at least one error. Evans, Nadjari and Burchell (1990) found that almost half of the 150 references from three journals contained errors. We examined a convenience sample of 59 of the 138 references in Bunn and Wright and found a total of 31 errors. Most of these errors were trivial, but 18 citations contained bothersome errors such as incorrect year, misspelled author’s name (Geurts is the correct spelling), wrong title (‘Predicting Nuclear Incidents’ was cited as ‘Predicting Nuclear Accidents’ for Chow and Oliver, 1988), page numbers that contained none of the actual pages, and missing or incorrect volume numbers (for example, one entry referred to volume 227 of Management Science). Except for the omission of one case where they referred to a study about forecasting annual earnings and provided an incorrect reference [Armstrong (1983) is the correct reference], none of the errors was serious enough to prevent the ultimate location of a paper. However, given that the study is the basic unit of analysis in review papers, we suggest that authors check the references in the page proofs against the original papers. We doubt that zero defects is an economically feasible goal (and we would be surprised if the references in this note were perfect), but much could be done to improve quality in this area.

Conclusions

Bunn and Wright suggest that despite the valuable work carried out to date, their conclusions are ‘very speculative’, and they believe that much applied research needs to be done to facilitate an interaction of statistical and judgmental methods. It would be an understatement to say that we agree. This is what we have tried to accomplish in our work on rule-based forecasting [Collopy and Armstrong (1992)]. Their paper provides an important step towards this goal by providing a framework, and by making it easier for researchers to locate and use relevant research. It should be a well-used paper.

References

Armstrong, J.S., 1983, "Relative accuracy of judgmental and extrapolative methods in forecasting annual earnings," Journal of Forecasting, 2, 437 – 447.

Blattburg, R.C. and S.J. Hoch, 1990, "Database models and managerial intuition: 50% model+ 50% manager," Management Science, 36, 887 – 899.

Chalos, P., 1985, "The superior performance of loan review committees," Journal of Commercial Bank Lending, 68, 60-66.

Chow, T. and R.M. Oliver, 1988, "Predicting nuclear incidents," Journal of Forecasting, 7, 49 – 61.

Collopy, F. and J.S. Armstrong, 1992, "Rule-based forecasting: Development and validation of an expert systems approach to combining time series extrapolations," Management Science, forthcoming.

Cooper, H.M. and R. Rosenthal, 1980, "Statistical versus traditional procedures for summarizing research findings," Psychological Bulletin, 70, 108 – 115.

Eichorn, P. and A. Yankauer, 1987, "Do authors check their references? A survey of accuracy of references in three public health journals," American Journal of Public Health, 77, 1011 – 1012.

Evans, J.T., H.I. Nadjari and S.A. Burchell, 1990, "Quotational and reference accuracy in surgical journals: A continuing peer review problem," Journal of the American Medical Association, 263, 1353 – 1354.

Goldberg, L.R., 1976, "Man versus model of man: Just how conflicting is that evidence?" Organizational Behavior and Human Performance, 16, 13 – 22.

Johnson, E.J., 1988, "Expertise and decision under uncertainty: Performance and process," in: M.T.H. Chi, R. Glaser and M.J. Farr, eds, The Nature of Expertise (Erlbaum, Hillsdale, NJ).

Libby, R., 1975, "Accounting ratios and the prediction of failure: Some behavioral evidence," Journal of Accounting Research, 13, 150 – 161.

Libby, R., 1976, "Man versus model of man: Some conflicting evidence," Organizational Behavior and Human Performance, 16, 1 – 12.

Wanous, J.P., S.E. Sullivan and J. Malinak, 1989, "The role of judgment calls in meta-analysis," Journal of Applied Psychology, 74, 259-264.