[Review written with Fred Collopy]
Bunn and Wright review the extensive literature on the use of judgment in forecasting. They use a narrative review to examine 138 articles. They present their review using the following structure:
As is evident from this structure, Bunn and Wright cover a broad range of issues in judgmental and statistical forecasting. We briefly summarize their conclusions about two important issues: how judgment can be integrated into models, and whether judgment is more accurate than statistical models of judgment under certain conditions.
Integrating judgment into quantitative models
In discussing the integration of judgment into quantitative models, Bunn and Wright identify four gateways: variable selection (where judgment seems to be useful), model specification (where there are conflicting beliefs among the schools of forecasting researchers), parameter estimation (where promising theoretical results have failed to improve practice), and data analysis (which remains heavily judgmental and poses challenges for researchers).
Judgment plays a role in determining which models to use, how to combine them, and how to adjust them. Bunn and Wright note that there has been little research on these issues. They identify two guidelines for combining: (1) use models based upon different assumptions or data, and (2) avoid models with positive correlations between forecast errors.
Experts versus models of experts
Bunn and Wright conclude that when experts use an explicit judgmental process to make real forecasts, the resulting forecasts are more accurate than those produced by models of these experts. They claim that this is because experts use additional information, particularly through the recognition of unusual events, which have been called broken leg cues. This is in contrast to the early bootstrapping literatures evidence that the model of the judge outperforms the judge. While we suspect that Bunn and Wright may be correct, we do not believe that they adequately support such a position. In concluding that experts have outperformed bootstrapping models in real world studies, Bunn and Wright cite Johnson (1988), Blattburg and Hoch (1990), Chalos (1985), and Libby (1975). Johnson (1988) studied two prediction tasks: in one task the experts did not do as well as the bootstrapping model; in the other task the experts were slightly more accurate for situations having broken leg cues, but this difference was not significant. Johnsons conclusion about his study (personal communication) is that it did not show that bootstrapping was more accurate, although it was as accurate for situations with broken leg cues. Blattburg and Hoch (1990) did conclude that experts were more accurate than bootstrapping models, and Hoch (personal communication) believes that experts are often superior to their bootstrapping models. But Blattburg and Hoch (1990) did not report data on this issue. Chalos (198S) was poorly designed to address this issue because his quantitative model was inadequate and the judges used more information than was used by the quantitative model. Libby published a revised version of his study in the following year [Libby (1976)], and Goldbergs (1976) re-analysis of these data showed that good forecasting practice produced a bootstrapping model that was superior for these data. Another benefit of the Bunn and Wright review, then, is that it helps to identify areas for further research. Research on the conditions under which experts are superior to bootstrapping should be of substantial value.
This difference in interpretation of the studies makes us wonder if the application of meta-analysis to this body of literature would have led to some different conclusions on other issues. Wanous, Sullivan and Malinak (1989) suggest that one benefit of applying meta-analysis in lieu of a traditional review is that it reduces subjectivity in the interpretation of the findings. Cooper and Rosenthal (1980) showed that meta-analysis yields better-supported conclusions than does a traditional review.
A comment on the study as the unit of analysis
Given that the study is the basic unit of analysis in review papers, we make a brief comment on references in Bunn and Wright. Eichorn and Yankauer (1987) concluded that published papers frequently contain incorrect references; 31 % of the references in their sample had at least one error. Evans, Nadjari and Burchell (1990) found that almost half of the 150 references from three journals contained errors. We examined a convenience sample of 59 of the 138 references in Bunn and Wright and found a total of 31 errors. Most of these errors were trivial, but 18 citations contained bothersome errors such as incorrect year, misspelled authors name (Geurts is the correct spelling), wrong title (Predicting Nuclear Incidents was cited as Predicting Nuclear Accidents for Chow and Oliver, 1988), page numbers that contained none of the actual pages, and missing or incorrect volume numbers (for example, one entry referred to volume 227 of Management Science). Except for the omission of one case where they referred to a study about forecasting annual earnings and provided an incorrect reference [Armstrong (1983) is the correct reference], none of the errors was serious enough to prevent the ultimate location of a paper. However, given that the study is the basic unit of analysis in review papers, we suggest that authors check the references in the page proofs against the original papers. We doubt that zero defects is an economically feasible goal (and we would be surprised if the references in this note were perfect), but much could be done to improve quality in this area.
Bunn and Wright suggest that despite the valuable work carried out to date, their conclusions are very speculative, and they believe that much applied research needs to be done to facilitate an interaction of statistical and judgmental methods. It would be an understatement to say that we agree. This is what we have tried to accomplish in our work on rule-based forecasting [Collopy and Armstrong (1992)]. Their paper provides an important step towards this goal by providing a framework, and by making it easier for researchers to locate and use relevant research. It should be a well-used paper.