THE POLL DANCE: Nate Silver on Predicting Elections
Nate Silver turned a love of baseball into a breakout career in forecasting player performance, inventing his own statistical system for gaming the game. Then he turned his methodical eye on politics, shocking the media-polling industrial complex by correctly calling 49 of 50 states in the 2008 presidential election. He’s now the house polling analyst for the New York Times, bringing a rigorous and gnomic level of intensity to his Five Thirty Eight blog. And his well-timed new book The Signal and the Noise comes out this week, attempting to unspool the science and art of trying and/or failing to predict the future of most anything. We talked with Silver about how the 2012 race looks so far, how it might all turn out, and how this could be the year when all the predictions (including his own) go very, very wrong.
You noted recently that there had been over 400 national polls since June 1. Is more data always good in this context?
If you’re doing it right, then yes. If you’re doing it wrong, it can get you in trouble. We’re getting to the point now that instead of having three polls released per day, you’re getting about twenty. So that means if you’re an Obama supporter, you can pick up the three data points you like the most and tweet about those and tell stories about those. And if you’re a Romney supporter, you can dismiss the Obama polls as outliers and take the three that youlike the most. People who aren’t committed to perceiving the signal — and who just want to create noise in a partisan environment — can get themselves in trouble in that respect.
We use all the polls in our analysis at Five Thirty Eight, but we weight them differently — a formula based on how well they’ve done, and if they’ve used industry standard methods. They get rewards both for performance and for qualitative measures. I don’t want to use my own judgment on a day-to-day basis — I don’t want to have to say this poll is right, and another one’s wrong. I want to set up rules for engaging with this data. It takes a lot of time to think about what those rules should be. But once you do that, it makes you more objective. If there’s one data point that drives the news narrative — one poll, or one story — and there are six data points that are buried because they don’t fit the current structure of what the media is saying about the race, you’re still going to give those other data points a fair shake. It’s tricky, but the more I can automate this stuff, the better. Because sometimes we’ll come out with an analysis that doesn’t fit what the conventional wisdom holds.
Are there pollsters or polling firms who reliably produce questionable data?
There are some terrible polling firms. But if you’re slanting your polls toward one party or another, we can detect that. It’s not too hard statistically, and people know it intuitively. You can run a test to say that, on average, the polls from, say, Rasmussen Reports are two or three points more Republican-leaning than the average poll. Or the polls from Public Policy Polling are two points more Democratic-leaning. So you can de-bias them that way and translate things to a common scale.
You also remarked that “90% of ‘game-changing’ gaffes are less important in retrospect than they seem in the moment.” This feels right intellectually, but is that an apt generalization or an actual statistic you’ve researched?
I guess that’s a semi-fake stat. But I know from covering campaigns that a lot of these things don’t pan out, in part because voters do actually care about the economy and big-picture issues, and not some daily controversy necessarily. For example, it was thought that Romnney’s handling of Afghanistan was poor and was going to doom his campaign, but the polls didn’t really move against him at all. But even if it’s 90% likely that “game-changers” don’t matter, there is the 10% that are the exception. I’ve found it’s just very hard to know what those 10% are.
In the 2000 campaign, Al Gore developed a reputation for exaggeration … maybe it was fair and maybe it wasn’t. If you look at the case where he supposedly said he invented the internet — which he never actually said verbatim — that was one line in an interview with Wolf Blitzer that no one even noticed until a few days later. And then it kind of blew up after that, and it did affect the narrative of the race. So things that are viewed as being hugely significant at the time turn out not to be, and also vice versa.
Are there big feuds or rivalries in the world of political polling?
There’s always been a feud between traditionalists — using live interviewers to call people — versus robocalling, where you have an automated script that will dial and call you. I used to have a little more sympathy for the robopolls because they’re so simple … press 1 for Romnney, press 2 for Obama, there’s not much more to it than that. But now I see robopolls turning out more and more strange data, partly because they’re not allowed by law to call cell phones in most circumstances. So they’re missing a whole big chunk of the electorate who don’t use landlines, and it goes against the notion of a scientific sample. So I’ve started to side more with the traditionalists.
It’s worthwhile to get good high-quality polling data, but people aren’t doing it much anymore — when you have media budgets being cut, when you have pressure from austerity. Also it’s more expensive to conduct a live poll now because people don’t pick up their phone as much. You might have to call someone five times because we don’t have that kind of culture anymore, where you sit by your landline, and if a stranger calls, of course you answer your phone. It just doesn’t really occur in many households anymore. On the other hand, you have Google doing more with surveys, and other internet companies trying to see if that could be a reasonable solution. Internet polling is getting there, but it hasn’t really been tested very much yet.
You can combine polls with other types of data. Fundraising data tells you a lot — who’s actually translating their infrastructure into getting in touch with enthusiastic voters, getting contributions. That really is hard data. It all has to be reported to the FEC. There’s not any issue of calling the wrong people, or weighting that data. We’ve found that especially in races where you don’t have much polling, or where the polling isn’t in agreement with the other polls, looking at those non-polling factors helps.
It’s a little early to know exactly what we can learn from social media metrics. I think the way we’ll be looking at this stuff will be very different in four years, in eight years, in twelve years. For right now, we’re kind of in an awkward adolescent age … we’re out of the classical innocent era of our youth where you could just call someone on the phone. But we’re not sure what the substitute for that is yet.
We’re eventually going to have an election — and it could be this one — where the polls are way off in one direction. We could go into Election Night thinking Obama’s going to win by three, and he loses by five. Or wins by ten! It’s my nightmare. Our forecasts basically do as well as the polls do. So if we believe Obama has a 95% chance to win — say, if he’s up by a few points on Election Eve — that 5% is still going to happen some of the time. In the polling industry, it’s getting harder and harder to do polls the right way, and with pollsters that might use more questionable techniques, maybe that 5% is going to come up sooner rather than later.
A main focus of your new book is uncertainty, in particular how being honest about the level of uncertainty can make a prediction more accurate. How would that apply to something like the presidential race?
Everything we say about the election is couched in terms of probabilities. You need to have a way to quantify the relative importance of a piece of information. SInce the conventions, we’ve had Obama go from about a 70% chance to win, moving up to 80%, then down to 75%. If you’re comfortable with probability, it gives you a way to measure things along a spectrum. The mainstream coverage went from “this race is too close to call” to “Romney is doomed.” But being realistic about it, if you look at websites where you can bet on these things and at bookmaker’s odds, then Obama’s odds went up a little bit from 70% to 75%, but it’s not a night-and-day difference. The problem is that narratives — including journalistic narratives — want everything to be a game-changer.
Including the primary campaigns, these election cycles last a year, year-and-a-half now. You get so used to having days when there’s not really any news, that when thereis some semblance of news, everything gets magnified in importance, and the volume’s always turned up louder than it should be. But if you have your volume turned up too loud in the truly important moments, you lose the top of that frequency curve. The things that really are important in the campaign sometimes don’t get enough coverage.
In the 2008 elections, you successfully predicted 49 of 50 states’ choice for president, the only exception being Indiana. What message would you like to send to the voters of Indiana for the 2012 election?
Maybe they should have had a recount! I think the turnout was 110% in Gary, Indiana. If I had to say this year, I wouldn’t predict 49 as the number of states we’d get right. The over-under’s probably 45. You hope you get lucky on some of the close calls.
— Chris Mohney