Because I need to talk about something other than the Sox:




Regular readers will note the inclusion of a couple new trackers today:
CQ Politics' electoral map and
the Washington Post electoral map by Chris Cillizza. The inclusion of these two maps makes the total numbers not exactly comparable to the previous numbers, but only by a small margin. Without these, the tossups-allocated number would have gone up a bit more, the tossups-excluded number, a bit less.
The popular vote numbers have gone down by over a point; this is mostly due to a correction in the Iowa markets that brings them more in line with the rest of our numbers (from Obama +11.2 to +7.6). I'm not a huge fan of the Iowa markets (though others are), simply because the low cap on investment amounts makes investors less risk-averse than they might otherwise be if more money were involved. There was also a marginal decline in the national vote trackers, but nothing to really note.
The win percentage figure continues to go up, but this is mostly due to a big increase in Obama's win percentage from the FiveThirtyEight figure. This may be the last straw for me--there's definitely something wrong with Nate's model.
My specific problem with the FiveThirtyEight model gets a little technical, but bear with me. FiveThirtyEight assumes that
as the election gets closer, the polls will get tighter. What this means, in effect, is that whoever is in the lead is predicted to be still in the lead but by a smaller margin by election day. The simulations then run in the context of this framework, allowing for some margin of error in the polls.
What this doesn't account for, however, is the fact that such a simplistic prediction gives an undue advantage to the person who's ahead, because in the model the leader will retain the lead at a smaller degree and any other variation is random. Since Nate is a baseball statistician by trade, I'll give a proper analogy. If his model were predicting the results of a baseball game, he would be assuming that whichever team was ahead at a given moment had run into a string of good luck, and more likely than not the game would tighten by the end. Whether that's true or not, it doesn't help much in predicting how each at-bat will play out. When it comes to that, his model assumes that the variation will be random between at-bats.
But we know that's not true. When a pitcher is throwing well, he throws well to batter after batter, but if he's off, he's off to everyone. The same works for polls. If a candidate improves one day, there's likely a reason for it, and that reason is likely to carry over to the next few days as well. So a proper adjustment for tracking the national mood between now and election day would be to model each day's electoral change individually, with every day's prediction factoring in the previous few days, in each course of the simulation. The regression-to-the mean assumption could still be factored in, but in such a way that provides a balance between allowing for trends, compensating for lucky polls, and accounting for potential diminishing returns (since each new supporter gets the candidate further from his base, and is thus harder to convert).
This sounds complicated, but it's really perfect for a straightforward
time series analysis. Find the autocorrelation between days, test for the general trend (which Nate assumes to be asymptotic to 50/50), and see to what degree the absolute margin affects the day-to-day variation. We have plenty of data to work with--we could use this year's polling data, or previous years which run straight through to the election. After establishing the parameters of the model, then we set it to work simulating electoral outcomes.
With that sort of correction, I'd expect the FiveThirtyEight to forecast the probability of victory at a much more reasonable level. As it stands now, the prediction of a 94.9% chance of an Obama victory is ridiculous, and it points it the biggest flaw in the FiveThirtyEight model:
things happen, and it has no way other than completely-random variation to account for them.
</rant>
One last bit of housekeeping:
Gallup introduced an alternate likely voter model today.
Pollster and (sigh)
FiveThirtyEight have commentary. I tried to find the internals (the breakdown of the samples by demographics) but failed, but from Mark Blumenthal's report on Pollster, it seems the standard LV model undercounts the youth demographic when compared to past elections--an assumption that is absurd in this election, given the increased youth turnout rates in the primary. Frankly, I don't like either model. Excluding past voting history is a foolish throwing-away of information. Instead, Gallup should factor into its standard model
when the voter registered, using data from past elections as a guide, as well as whether they've been contacted by a campaign. The closer to the election the voter registered (all else being equal), the more likely she is to vote, since enthusiasm may wane over time, and voter contact is a very strong predictor or turnout. This allows them to factor in past voting behavior, while still accounting for the dramatic increase in mobilization this election cycle.
As I teach my American Politics undergrads, the best predictor of whether someone will vote is whether they voted last time, and the second best predictor is whether they voted the time before that. But being mobilized by a highly-efficient and effective campaign apparatus is also a clear predictor of turnout behavior, and Gallup's likely voter model does not account for this in any way beyond the voter's stated intention.