Forecasting the future of election forecasting

“Can we please be done with polls now?” tweeted Molly Jong-Fast on the day after the U.S. presidential election. “Ban election forecasts, or at least ignore them” read the subhead on a widely-circulated piece in Slate.

Over the course of a week-long slog in which much of the world livestreamed the counting of in-person, mail-in, drop-off, and provisional ballots, the most heated of hot takes about the apparent uselessness of pre-election polls and forecasts began to look premature. As results came into sharper focus, the election map began to resemble something that looked a whole lot like what the polls had predicted—albeit by much closer margins in some key states.

To be sure, those margins were noteworthy—especially given how much they replicated similar errors in the 2016 results. After that race, thoughtful leaders in the opinion research world undertook extensive efforts to diagnose what might have contributed to “large, problematic errors” in many of these same places. The task force wrote reassuringly that despite many complex changes in modes and methods, the science of survey research remained largely sound. (On average, polls are, in fact, as accurate as ever.)

Many likely culprits were singled out about that election: historically large numbers of undecided and third-party voters, too few pollsters weighting samples for response bias by education, a lack of state-level polls late in the race when events caused significant movement toward Donald Trump. As of now, none appear applicable to this cycle. Even well-respected state-level polls from the New York Times/Siena College and the Washington Post/ABC News were off by double digits, making it hard to shake the feeling that the entire industry may be dealing with more fundamental, structural problems. Are conventional methods disproportionately failing to reach the disaffected, distrusting voters most drawn to Trump’s brand of right-wing populism, underestimating their likelihood of turning out to vote, or both? Knowing more about the campaigns’ internal polls, which may have used different assumptions, along with final updated voter files will eventually help answer some of these questions.

It should be underscored that some polls did hold up exceptionally well. On the eve of the election, venerated Iowa pollster Ann Selzer released results that previewed the eventual outcome almost exactly, showing Republicans holding on to a much larger share of voters than almost any other reputable pollster. Results were often explained away as an outlierThe New York Times told its anxious readers to “put it in perspective” as “only one poll.” Such advice may be eminently reasonable, but this year the flaws in that approach have been laid bare.

“In averages we trust” has become a mantra for many political journalists trying to avoid overemphasizing any single error-prone survey estimate. But averaging polls only helps correct for random sampling-induced error; it does not help if results are collectively skewed due to faulty methods, modeling assumptions, or “herding.” In a polling landscape where best practices are legitimately in flux and where grifters and opportunists seek headlines to gin up publicity and exposure, averaging may lead to over-confidence about election outcomes that are in fact a lot more variable than suggested by sampling error alone.

“You don’t really have many reporters, in my experience, who are doing the work of looking at the methodology,” Ann Selzer pointed out in an interview with the Columbia Journalism Review five years ago. Doing that work will require greater transparency from public pollsters—perhaps even a push for pre-registration—and more attention paid by journalists and polling aggregators to individual pollsters’ methods instead of lumping results together. (Correcting for black box “house effects” or past predictiveness may not be nearly enough.)

Election forecasters themselves will no doubt resist calls for any kind of reckoning of their own. They will point to the final tabulation as proof of having appropriately handicapped the flawed underlying data they were provided. After all, as Nate Silver reminded readers on the eve of the election, his 89% forecast of a Biden victory only ever indicated that “it’s a fine line between a landslide and a nail-biter,” and the result, predictably, was somewhere in between.

But that of course is the problem. Forecasters must grapple with whether they seek to be anything other than a diversion for obsessed hobbyists. I don’t fault them for emphasizing how uncertain their predictions actually are. But there is something deeply unsatisfying about this form of prognosticating punditry, which simultaneously claims superior decimal-point precision while humbly insisting that its declarations should only ever be treated impressionistically. As long as the main take-away really isn’t much more than ¯\_(ツ)_/¯, the future of election forecasting may not be bright.