Blog

# HEPA - Historical Error Poll Adjustment

A method of accounting polling biases

Nov 3, 2022, 1:51 PM

We have a huge issue at hand: the polls. Our model takes polls at face value; due to time constraints, we had to assume polls were not inherently biased, but they are — polls generally tend to overpredict the Democrats. This could be one of the reasons our model is giving them a significantly higher chance of winning the senate majority.

Here’s the graph of the average poll error across the 2018 and 2020 senate cycles:

Fig 1. (Positive values mean a shift in favor of the Democrats, negative in favor of the Republicans)

A bias of over two points! Granted, we did not weight these polls based on when they were conducted, so a weighted average of the bias would most likely be significantly lower (as inaccuracy would most likely decrease for polls closer to the election). Regardless, we decided this was an issue we wanted to tackle. FiveThirtyEight’s oldest senate dataset is from 2018, so we could only use data from the last two election cycles in our calculations. We used polls only from “seesaw states” in 2022, which we specified as having less than 40% difference in win probability between the two parties (at most 70% predicted win percentage for either party and 30% for the other). Viewing our model’s values as flawed, we determined the swing states from FiveThirtyEight’s projected win probabilities on 10/21/22. The states were: Nevada, Georgia, Wisconsin, Arizona, Pennsylvania, New Hampshire, Ohio, and North Carolina. While most of these have only had one senate election since 2018, both Arizona and Georgia have had two. Thus, we were working with 10 races from eight states. We calculated a weighted average of the polls for each of these races using the exact same methodology our model does. These were the results, where a negative error value is a Democratic lean, and a positive value is a Republican lean:

StatesPoll EstimationReal ValueError
Georgia (Osoff v Perdue)0.47916666670.5060.02683333333
Georgia (Warnock v Loeffler)0.50011419860.510.009885801436
Pennsylvania0.58157255680.566632757-0.0149397998
Wisconsin0.56163818160.5545545546-0.007083627003
North Carolina0.52094004160.4905857741-0.03035426757
Ohio0.56804192990.534-0.03404192995
Arizona 20180.50101685090.5122950820.01127823107
Arizona 20200.53493807170.495-0.03993807168
New Hampshire0.58129052980.5799180328-0.001372497013
Fig. 2. For reference, 0.5 means 50%. Thus, an error of 0.02 would be 2%.

We decided to create a weighted average of these errors for a shift that would be applied to our polls calculations. Our methodology was as follows:

1. If there was only one senate race in a given state over the past two election cycles, that race would be weighted 0.2, while the remaining 0.8 would be distributed among the other nine races
2. If there were two senate races in a given state (Arizona and Georgia) over the past two election cycles, each race would be weighted 0.16 while the remaining 0.68 would be distributed among the other eight races
StateShift
Georgia-0.002073520184
Pennsylvania-0.006708297016
Wisconsin-0.005835388927
North Carolina-0.008421015656
Ohio-0.008830755921
Arizona-0.006976943337
New Hampshire-0.005200818928
Fig. 3

Thus, the Democratic two-party percentage shifted down from between 0.2 to 0.8% in each of the eight states above after adding our calculated shift values (Fig. 3). We also added variance to these shifts by calculating the standard deviation of the errors from the previous cycles. However, after adding variance to our win percentage values adjusted by shifts, we found this variance had a negligible effect on the results of the simulation, so we decided to remove. These shifts added to the polling averages for each state in combination with the simulation outputs: