With yesterday’s Irish presidential election and a general election in Spain due on the 20th November, the Irish-Spanish Figure It Out team this week thought it would be interesting to look at how opinion polls are used to predict voting preferences for the whole electorate.
There are many influences which are beyond the control of the statisticians (and politicians!) and which cannot be predicted, such as a last minute revelation about a candidate or a particularly impressive TV performance. However, we are not specifically interested in this blog in explaining the difference between polls and election results. We want to understand how well the results of a poll, typically generated by interviewing a small sample of voters, can be extrapolated to the rest of the population.
Have you ever noticed that in opinion polls sample sizes are typically around 1,000 people? While results are usually shown with a margin of error of ±3%? How does this relate to the size of the electorate?
Three opinion polls on the Irish presidential election published within the last week or so, each using that all important sample size of 1,000 and confidence interval of ±3%, gave similar results for the leading candidates; Sean Gallagher received 40%, 40% and 38% in the three polls and Michael D. Higgins received 25%, 26% and 26%.
Now, Ireland has a population of around 4 million. You might think intuitively that in Spain, with a population of around 40 million, you would need 10,000 people in your sample population to get the same margin of error. In fact the mathematics of probability proves it has nothing to do with the overall population size. Many situations use a survey sample of 1,000 because the total population size, once it is sufficiently large, does not affect the sample size required to achieve a particular margin of error. So for large populations (>100,000) a sample size of 1,000 provides a 3% margin of error with a 95% confidence level. This means that there is a 95% chance that the actual result (in the total population) is within ±3% of the reported survey result.
As we can see in the chart above, increasing the sample size does not proportionally improve the precision of the survey results. If we were to look for a more precise result, say ±1%, the sample size would have to increase disproportionally to almost 10,000 respondents, which may become prohibitively expensive.
Size is not the only critical factor when drawing a sample from the population; the way the sample is selected will influence the accuracy of the results. For instance, if demographic categories (such as gender or age group) are to be compared, stratified sampling will be required to reflect the demographics of the population. In addition the sample has to be randomly selected from the total population with no bias.
A bad way to select the sample population is highlighted by Ben Goldacre in his recent blog about sample sizes: “if you want to know about the health of the population as a whole, but you survey people in a GP waiting room, then you’re an idiot”. Bias can also be introduced by only ringing people during the daytime or only ringing landlines rather than mobiles. Results can be further skewed by assuming that the votes of the undecided voters in polls will be split evenly between candidates on election day. In addition, even with a genuinely representative sample, individual interviewees will still be chosen at random and so the views of the interviewees may not be representative of their demographic grouping. Increasing the sample size will marginally increase the accuracy of the results but there will always be a margin of error or “confidence interval” associated with opinion polls.
So in selecting a sample size, size does matter, but only up to a point but it is not just size that counts, it is how you do it that is just as important!.
Poll results (publication date):
http://www.irishtimes.com/newspaper/breaking/2011/1024/breaking2.html (Mon 24 Oct 2011)
http://www.banda.ie/ (Sun 23 Oct 2011)