Predicting the Past

PoliStat | Nov. 2, 2018, 2:35 p.m.

Can we use our model to predict the 2014 House of Representatives Election?

    It’s all well and good to create a model, but, if we don’t know it can work, what’s it even worth? We decided to test out our model on predicting the 2014 election so we can see if what we created is an actual viable model. We chose 2014 because it was the most recent midterm election; we figured this would be better than 2016 because the 2016 election was a general election, so turnout is different and there are also probably effects on the house caused by down ballot voting by people who only vote for the president and people from their party.

    Because of time constraints, we had to get rid of some aspects of the model. Gathering 4 year old polls for every state is just not practical, so the only polls our model accounted for where national polls gathered in the month prior to the election. Most of the individual district polls were buried under years of polls and were tedious to dig out, but it turns out that these polls did not really matter (more about that later). Because we weren’t using district polls we also had to throw out Blairvoyance, because Blairvoyance is based on polls.

    We used data from the 2012 election to model the 2014 turnout and the National Mood of the time. Our current model pulls data from the past two election cycles: 2016 and 2014, however, we only used data from the past election cycle, 2012. Because many states were redistricted during 2010 and changed the total number of districts, we decided not to include that data in the modified model.

    In our test, we retained the same overall algorithm and method. Because of our many changes, we had to modify the code slightly. We changed the parts of the code that took in the past two election so that it only took in one election. We also took out lines of code that required poll results and Blairvoyance.

    The modified model we used for 2014 worked pretty well. Our model predicted the Democrats would win 192 seats, and in the real world they won 194. We also used the log loss function to measure the uncertainty in the model. To do this, we calculated uncertainty in district using the formula: ln(Result*Chance+(1-Result)(1-Chance)), where the Result is 0 for a Democratic loss and a 1 for a Democratic win, and the Chance is the likeliness of the Democratic candidate winning the House seat predicted by our model. We then take the average log loss across the 435 districts to determine the average uncertainty in our model: -1435(log loss)=0.3805. Ideally, the average log loss of a model should be under ln(0.5)=0.693, otherwise our model is worse than just putting 50% chance for the Democrats to win in each state, which is pretty boring. Luckily, our average log loss is less than ln(0.5), so our uncertainty in our predictions is relatively low.

    Because our modified model predicted the past relatively well, it raises the question whether or not district polls really matter. With the current model, we place a relatively high standard on whether a district has has polls and if they don’t we use Blairvoyance to act as the polls. The polls do add value, in a way, because they reflect the current feelings of the nation (if the poll is done well). Based on the national polls we gathered, the general stance of the U.S. population was leaning Republican, which closely aligns with our fundamentals. So, since the fundamental data reflected the general national mood, it makes sense that our modified model predicts the 2012 election relatively well.

    Overall, we predicted the vote share for 39 districts incorrectly. 19 of these districts were wrongly favoring Democratic candidates, while the other 20 wrongly favored Republican candidates. Our largest error was in Texas 13, where we predicted the Democratic candidate to have 63.1% of the vote share, but in reality they only received 13.18%.  So what went wrong? It is possible that in this case that any district polls or Blairvoyance input would have made a difference. Another interesting district to note is Nebraska 2, which is also a toss-up district for the 2018 House election. We predicted the democratic candidate to win 34.70% of the vote share, but they won 51.8% of the vote, winning the district.  However, it makes sense that out model underpredicted the vote share for the 2014 election because Nebraska 2 has historically voted for Republican Representatives. In fact, 2014 was one of the few times that a Democratic candidate defeated a Republican incumbent. Maybe if we had district polls for this district, we would have detected the mood shift, which would lead to a more accurate result.

    The 2014 House of Representatives Election was a rough time for the Democrats. They lost seats in the House and never gained the majority. Even though our model for the 2018 House elections favor Democrats, through our tests, we can see that it can be used to predict a Republican win from the past.  

--CH, ES