What if We Threw a Neural Network Over the Whole Thing?

PoliStat | Nov. 3, 2018, 5:51 p.m.

An Alternative Approach to Election Modelling

Introduction and Overview

    A popular technique in many fields that require the generation of predictions is the use of machine learning and neural networks. While traditional statistical models require making assumptions and attempting to find patterns or correlations between data points by hand, a neural network leaves that to a computer. While I have no doubt that the traditional statistical models have already determined the most significant variables in election predictions, my hope is that a neural network can find and utilize previously unknown patterns in generating ever more accurate predictions. For example, ORACLE and the neural network both use incumbency and  national mood as inputs. However, while ORACLE may reflect on trends that are found in incumbency and trends that are found in national mood, it does not consider any relationship between them. A neural network would consider the possibility of a relationship, and not only between these inputs, but also between these inputs and every other input. 

    Now, what is a neural network? To put it simply, imagine the human brain. The brain is a bundle of neurons, that are each connected to a set of other neurons, which are themselves connected to more neurons. With this collection of neurons, information comes in, from perhaps the eyes or the nose, and is then turned into an electrical impulse. This electrical impulse travels along the connections between the neurons, sometimes weakening and sometimes being strengthened depending on the importance it has to the neuron it is passing through. After the impulse has traveled through whatever neurons it needs to pass, it gives you its prediction, for example, you see an apple. The same logic applies to neural networks. A neural network is a collection of numbers; each called a node, which is analogous to a neuron. A tensor (think vector) of numbers is given to the network, manipulated by the collection of numbers in the network, and turned into a different tensor of numbers that represents the answer. Since the numbers in the collection differ in magnitude, certain information is weighted more heavily than other information, which is similar to how our neurons weaken and strengthen the electrical impulses passing through them. All in all, a neural network is essentially an attempt to transpose a brain into a computer.

    Of course, the network does not simply pop into existence, ready to be used on whatever application it is needed for. Initially, a neural network is a collection of pseudo-randomly generated numbers that have no meaning to the problem. The user must have a set of data with correct answers to “train” the network with. This is what is known as supervised training; while there are other training methods, supervised training will be the method used for my network. The network is given an input from this training set, it generates an  output, and then compares the output with the real answer. This comparison will result is a “cost” which is determined by a set cost function (think log-loss, root mean square error, mean square error, etc.). The network’s goal is to minimize the “cost” of its prediction but to do that it must be able to change the numbers in its collection. Therefore, with each comparison, the network shifts the numbers in its collection by a small amount based off of an optimizer (the concept of optimizers is a complex aside that you are welcome to look into, but isn’t particularly required to understand the model). The network then runs another comparison with the shifted numbers to see if they generate a less “costly” prediction. The reason we try to give the network different data each time is so that it optimizes the numbers for any input. This way, when it sees an unknown input it can still generate an accurate prediction. This is also why we require that our training data attempt to be representative of the inputs that the network may receive. It is important to note that the accuracy of the network is therefore determined by how well it does on data that it has not seen before.

    My model specifically uses a variation to the traditional neural network described above, called Long Short-Term Memory (LSTM). While the concepts remain the same in terms of design and training, the benefit of an LSTM is that it can maintain dynamic memory. A problem with normal neural networks is that they base their predictions entirely off of what it set its collection of numbers to. In a sense, a normal neural network doesn’t use “experience” when it generates its predictions. Take for example a network trying to predict the word “French” in the sentence “In France, they speak __”. To be able to do this, it would have to know that the subject “France” appeared earlier in the sentence. A normal neural network, however, would generate a prediction based off of the word “speak”, and perhaps it has been trained to believe that the word “German” follows “speak” the most often. Therefore, the network would most likely predict the sentence “In France, they speak German”. An LSTM would have learned while training that the subject of the sentence is an important piece of information to keep and store it in memory. Specifically, it takes the previous set of inputs and chooses what to weigh more heavily, less heavily, and which to forget completely. For the house election, I believed that using an LSTM would allow for the network not only to consider the most likely prediction based off of numbers but also historical electoral trends.

Using a Neural Network for Elections

    My model takes five data points in its input tensor: partisanship, whether it is a Republican incumbent, whether it is a Democratic incumbent, what the national swing is, and what is the presidential approval rating if they were a Democrat (so republican presidents have an approval rating of 1-(their current approval rating)). Its training set is all house elections since 1952, and it is attempting to predict the democratic margin, Democratic percent vote, and Republican percent vote. The model is trained over 50000 iterations and then asked to generate a prediction for each district based off of the data we collected for this election cycle. It optimizes its prediction using ADAM (a combination of AdaGrad and RMSProp algorithms) and is attempting to minimize the log-loss.

Testing the Neural Network's Effectiveness

    To determine how effective the model was, I broke the training set up into two parts. Part one was the data the network would be trained with, which was a pseudo randomly selected 80% of the original training data. The remaining 20% of the data (part two) was used as the validation set, to give an accuracy benchmark for the model. The model never sees the validation set when it is being trained, so we may treat the set as “unknown input data” and assume that the model’s accuracy is entirely dependent on the network’s function of fit. On the 2053 pseudo randomly selected districts in the validation set, the network accurately predicted who would win (Democratic or Republican candidate) in 82.12% of them. Looking at the graphs of the expected vs. predicted Democratic vote percents and the expected vs. predicted Democratic margins, we can see that the model has fit the data extremely well.

Neural Network vs. ORACLE and SEER

    The neural network predicts that based on purely the 5 input types given above and the voting patterns it has found that the democrats will win 217 seats in the House this election cycle. Now, how does that compare to our statistical models, ORACLE and SEER?

    Interestingly enough, even though the neural network predicts a Republican win by one seat, it is on average 4.52% more in favor of the Democrats this election cycle compared to ORACLE. That being said, the average difference in the predictions is only 4.52%: the neural network and ORACLE are extremely close to each other numerically. However, our ORACLE model predicts that the Democrats are much more likely to win more seats in the house than what the model predicts. This may be because ORACLE takes into account polls, which my neural network does not. It could also be that the neural network is giving the Democrats higher percentages in solid Republican districts and not as big of an advantage in battleground districts, therefore not increasing the amount of seats.  In the graph below, ORACLE is orange.

    On the other hand, the neural network matches our SEER predictions extremely well, with the average difference in predictions being a small -2.66%. However, it should be noted that the neural network predicts that the Democrats will win more seats in the house than SEER does: 217 vs. 211. This is likely because the neural network is recognizing a higher democratic shift than SEER. In the graph below, SEER is orange.

    Overall, I believe that a neural network is more than capable of predicting future elections. From the accuracy test, we can see that the network can adequately generate a function that fits historical voting trends. From the tests against ORACLE and SEER, we have shown the model, even when limited to these 5 input tensors, can predict that there is a Democratic lean this election cycle. If we could train a second model to predict the shift that a certain set of polls would make to any given district’s percentages, then I believe we could have an extremely accurate model for predicting elections.

-W. L.