This post is a couple of days late, since the elections were on Saturday, but well, I felt I should get to grips with the theory before making any bold statements. So for the past two days I’ve been solidly on Wikipedia learning about standard error and margin of error, testing values and looking up probabilities on the normal distribution tables.

Just when I’d more or less grokked it, I thought it might be wise to Google the “poll margin of error” to make sure I was definitely doing it right. Google Suggest offers the option of adding “calculator” to the end of my query, so I do. Then I realise I’ve just spent hours learning the theory behind this neat and easy to use calculator. Fail :)

So, onto the post, shall we?

The question I’m going to answer may have crossed your mind recently:

“How likely is it for the exit poll to predict the wrong candidate?”

Here’s an appropriate snippet from the exit poll release:

Random sample of votes from first 24 hours of voting

The 24 hour thing complicates life because the first 24 hour votes may not be representative of the other voting days.

In a previous draft of this post I went into mind numbing detail explaining how to discern and proportionally apply the difference in voting patterns before and after the 24 hour period. After all of that, I ended on a sentence that pretty much said: “I currently don’t know the change in voting patterns, so I’ll assume it to be 0 for the rest of the post”.

Right, job’s a good’un.

Next up, inputted into our statistical equations, if we were getting really serious we’d want to know the population size from which the sample was taken, that is, how many votes were cast in total in the 24 hour period for each position. However, this theoretically shouldn’t affect the value of the equations if we’re using the right methods. The statistical calculator assumes a multiplication factor of 1.0 and the one I worked out was 0.94, so that’s good enough for me.

Now we enter the part which used to be littered with calculation print-outs, equations and useful numbers, but we don’t need that anymore thanks to the calculator.

The next bit of useful info from the exit poll data is the sample size used, which is 100.

To give a sense of scale, my estimated number of presidential votes on the first day was 1773. In general, the larger the sample size in relation to the total vote count, the more accurate the poll.

*Fun fact #1:*

Given the sample size is fixed at 100 for all positions, and the presidential race is likely to get the most votes, it follows that the presidential exit poll is the least accurate of them all.

The sample size isn’t the whole story, because in STV some votes can get discarded along the way. In this case, we had 40 votes to Hutchings, 34 to Ngwena, and 26 discarded.

Now we have that, we can enter it into the calculator, obtaining a standard deviation of 0.59. The calculator looks this up in the normal distribution for us and returns a value of 55.5 percent. If I’ve interpreted the calculator correctly, this value applies to the range outside the two votes (outside 54% and 46%), so there is 55.5% chance that the value falls outside that range.

If that doesn’t mean anything to you then that’s fine because it doesn’t mean much to me either, so I’ll have a go at rewording it. In doing so, I will have to take the assumption we can apply the same analysis about the 46% mark as the calculator did about the 50% mark. Though this is not strictly true, they’re close enough for that to work.

Now the rewording: If Ngwena’s vote goes more than 4% in the right direction away from the predicted vote, he has a chance at winning. This tells us that 55% was our lower bound of confidence on Hutchings winning. Now, if we take the two tails of the distribution on either side of that, and only look at one of them, we get the probability that the poll was in fact incorrect and should have shown a Ngwena win.

This is 27.8%, giving another interesting stat.

*Fun fact #2:*

The poll was only enough to show Hutchings having a 72% chance of winning.

(The reason this is an approximation is because one of the sides of the distribution will actually be a tiny bit smaller than the other)

Lastly, we do a bit of retro-analysis:

“How likely is it for Ngwena to have won by as much as he did, just going by the exit poll?”

Ngwena actually won by 58% to 42%, which is a complete reversal from what the poll predicted, so the probability will be even smaller.

(Another assumption here is that the ratio of discounted votes remained the same in the real election as it was in our approximated poll)

58% against 42% is a 16% difference between candidates, which is double the 8% difference found in the poll, so I’m able (I think…) to simply multiply our standard deviation value by 2 (0.59 times 2 is 1.18).

*This gives us fun fact #3:*

There was an 11.9% probability for Ngwena to win by as much as he did from the poll prediction.

As noted earlier, this is all assuming the Monday votes are representative of those on later days, which could very well not be the case. In fact, judging by these stats I’m pretty certain the voting patterns did change over the different days. Approximately 90% certain in fact :)

In truth, randomly sampling a poll for which you have all the data is very much an exercise in pointlessness. Nonetheless, some info is better than none, and it keeps tensions at nerve-breaking peaks, so all in all I’m still in favour of having exit polls.

I’d really like to suggest that the union draw the sample from the entire data set though, it doesn’t really serve anyone to do it this way. In fact I can’t help but feel a niggle of doubt when confronted with a stat that says there is a 90% probability that this outcome shouldn’t have been predicted by the poll, and pulling the sample representatively will help to quell any accuracy doubts in future years.

If someone at *Nouse* remembers I think it’d be a swell idea for us to release an accuracy value along with the exit poll release next year too.

Well that’s it folks, hope you found this as much fun as I did, lol. If you’re a bit of a mathematician and think I did something wrong, please use the comment box!

15 Mar ’10 at 11:07 pm

Bruno Gianelli

“I’d really like to suggest that the union draw the sample from the entire data set though, it doesn’t really serve anyone to do it this way. In fact I can’t help but feel a niggle of doubt when confronted with a stat that says there is a 90% probability that this outcome shouldn’t have been predicted by the poll, and pulling the sample representatively will help to quell any accuracy doubts in future years.”

I think you’re missing the point a little – a more accurate poll would negate the point of announcing the results at all! It’s got to leave something to chance…

Report15 Mar ’10 at 11:41 pm

Ali Clark

We’d lose the speculation value of how the votes changed after the first day, but the 75% accuracy is already too inaccurate for us to make any solid conclusions anyway, even with a representative sample.

Plus, they can improve or lessen the accuracy as much as they want by increasing or decreasing the sample size (personally I think 75% is enough to keep it interesting).

I was going to add that I don’t think people will care much for extra analysis based on voting days – the exit polls is only a few hours before the results and using an unrepresentative sample makes it needlessly fuzzy.

Report15 Mar ’10 at 11:52 pm

Ali Clark

And hey, if you’re reading and not a maths person I’d still love to hear opinions on why the voting (probably) went more in favour of Ngwena after Monday, maybe some extra campaigning that week?

Report16 Mar ’10 at 12:08 am

bill bailey

i didn’t really see tim campaigning at all which baffles me, some of the other candidates had a lot higher profile…. maybe people get pissed of with constant harassment and sea of posters and cardboard banners!

Report16 Mar ’10 at 11:56 am

Will Bailey

Democrats vote early

Report16 Mar ’10 at 12:30 pm

Sam Seaborn

Democrats and die hards

Report16 Mar ’10 at 12:31 pm

@Ali

One word: Fusion.

Report16 Mar ’10 at 12:37 pm

Will Bailey

And there was no rain.

Report