Saturday, June 27, 2015

Could A UK-Style Mass Pollster Fail Happen In Australia?

Advance Summary

1. Recently all major final polls and the vast majority of analysts failed to predict the UK election correctly.

2. Random sampling error, "shy Tory effect" and late swing probably did not make major contributions to this outcome.

3. More likely causes include herding (although this has not been conclusively shown to have happened), the difficulty of estimating turnout, and the abundance of essentially non-random "panel polling" methods.

4. Australian elections are easier to poll for because almost everyone votes and because there is very little tactical voting in the Lower House.

5. Despite this there is a greater risk of high average polling errors at the next federal election because of a rapid turnover of polling ownership and methods, which will make it more difficult for pollsters to detect issues with their methods.

6. Potential for Labor support to be soft even in very late polling also appears at this very early stage to be likely to be higher than normal.



=================================================================
A busy work schedule has kept me from writing much on here in the past few weeks, but here's a piece that's been on the back burner for a while that I've finally got around to finishing.

Fairly recently, we've seen a bad election for the credibility of opinion polls in the UK general election. I want to discuss the chances of something similar happening at the next Australian federal election.

In the UK election, eleven final polls with a combined sample size of 34,737 voters all predicted the final primary vote margin for Great Britain (England, Scotland, Wales) to be between two points to Labour and one point to the Conservatives, with four final results of Con+1, five ties, one Labour+1 and a Labour+2.  The final votes on election day saw the Conservatives up by seven in Great Britain, and winning an outright majority, something dismissed as almost impossible by virtually all the well-known polling aggregators.  It was an opinion-poll whale-stranding: all the polls were left high and dry, but at least they were all together.  

There's quite a bit of anger about these polls, and not surprisingly.  Voters made strategic decisions about who to vote for based on the assumption they were choosing between a Labour government that might be held to ransom by the Scottish Nationalist Party and a Tory government that would be held to some kind of account by either its existing coalition partners the Liberal Democrats, or perhaps other parties as well.  Had it been apparent that the real choice was between a Tory majority and a continuation of the existing hung parliament, many voters would have voted very differently.

There have even been calls to ban polls from being conducted in the last pre-election month, because incorrect polls can cause strategic voting errors and because some voters dislike poll-driven commentary. This is a bad idea - it would result in poll-driven commentary about who is winning being replaced by biased subjective pundit-driven commentary (which is even worse). 

It's rather tempting to lay the blame for the anger felt at wrong polling on the foolish 68% to 32% rejection of the Alternate Vote (for Australian audiences, optional preferencing) referendum in 2011.  Conservatives will now be extra happy that that referendum failed, while Labour supporters could and should feel like a real bunch of wallies (I'm not sure what the English equivalent is). But even with preferential voting the same sorts of issues could arise.  In the Queensland 2015 state election there was relatively little scrutiny of a possible incoming Labor government because there was not much belief that one could happen.  Had the UK actually had the alternate vote at this election, most likely the primary vote shares of both major parties would have been much worse and minor parties would have won a lot more seats.

It is worth considering whether this sort of disaster could possibly happen in Australia, and whether we will sometime soon have a major election where the polls are generally wrong and cause forecasts to go seriously wrong.  (This is a different scenario to Queensland where the primary voting intention polls were right but the preference-distribution modelling was wrong.)  Firstly there are some things we should more or less rule out as causes of the UK disaster.

1. It Wasn't Random Sample Error

Yes, the amount each individual poll missed by, while large, wasn't enormous.  On average the polls (Opinium, Survation, Ipsos, ICM, Com Res, Populus, YouGov, Panelbase, Ashcroft, BMG, TNS) had the Tories on 33.6% and Labour on 33.5.  Given that the national result was 36.8-30.4, it might seem like the average miss was only just over a 3-point swing.  However, the correct baseline is the Great Britain result (37.8-31.2) since the pollsters did not include Northern Ireland in their national totals.  Eight of the pollsters missed both major party results in their final polls by more than their margin of error based on sample size alone, and all of them missed at least one of the two majors.

Even if the amount the pollsters missed the major party margins by had been within each individual pollster's margin of error, that wouldn't explain what happened.  This especially means you, New Scientist.  If poll A has a margin of error of three points and poll A makes an error of that size, that is something that was supposed to have happened (or worse) one in 20 times by chance.  Actually the chance is 1/40 of making an error that large in one direction, and 1/40 in the other.  Even so, accidents happen.  But the chance of 11 polls all randomly making the same error in the same direction is 1/40 raised to the 11th power times 2, which is about one in 200000000000000000.

Another way of looking at it is to treat all the final polls together as a single poll.  Then, the margin of error is about 0.5%.  That would mean a 3% error across all the samples combined would be close to twelve standard deviations outside the mean, which again is more or less impossible.

2. It Probably Wasn't Mostly "Shy Tory Effect"

Shy Tory Effect is the name given to a theory that maintains that supporters of conservative parties are more likely not to tell live interviewes that they intend to vote Conservative, because if you say you intend voting for the Right you might be seen by the interviewer as mean-spirited, while if you say you intend voting for the Left you're at worst going to appear a well-intentioned softie.  It's best seen in Australia in the persistent leftward skew in Morgan's face-to-face interview polling.

Shy Tory Effect is especially strongly linked to the massive poll failure at the 1992 UK election, but reviews of pollster performance concluded it was a minor part of a more sweeping failure and pollsters started imposing solutions to address it.  Especially at the 2010 UK election there was no sign of it and it seemed to have gone away.  It's quite possible that it waxes and wanes in intensity, but the theory doesn't explain why pollsters specifically overestimated Labour, rather than the many other parties "shy Tories" might have said they would vote for instead.  Indeed some pollsters overestimated Labour by more than they underestimated the Tories. 

The other problem with the idea of this as a "shy Tory" election is that there is no reason for Tories to be shy when filling out internet surveys - but over the whole campaign the internet polls were more Labour-leaning (in gap terms) than the phone polls.

3. It Probably Wasn't Mostly Late Swing

When a whole bunch of polls get a result wrong it is often attributed to a late swing - to the several percent of "hard undecideds" who will not tell a pollster who they are voting for even if you flog them with a porcupine and set fire to their grandmothers.  Supposedly every one of these creatures makes up their mind who to vote for on the day, having not even leant in any direction before that.  Campaign svengalis love this narrative because it makes them appear indispensable when they win and allows them to say they couldn't possibly have seen it coming when they lose, but it's frequently not real.  I've seen plenty of real late swings (1993 and 2004 federal, 2006 Tasmanian, 2013 federal election PUP surge) and there is generally some hint of them in polling near the end of the campaign.  Some of those deciding in the last few days will go that way before those who decide in the queue outside the polling booth (if they exist) do so. 

Some polls were polling right up to the day and still not catching any whiff of late swing.  If late swing was the main cause of that big a shift it should have been detected, and it wasn't.

So what was it then?  We don't know yet and there is still a lot to be investigated before the causes of this error are identified, but here are some obvious suspects:

1. The Polls Appear Herded

(Note: this section has been expanded, and is now about Wonk Factor 4.)

Herding is what happens when some pollsters manicure their fine-detail assumptions or choice of which questions to lead with so as to make their results look more like those of other pollsters.  There are so many subjective choices in how to poll that this risk is always present.  It isn't necessary for all the pollsters to be involved in this, perhaps even subconscious, practice for its results to be recognisable.

The herding debate isn't straightforward, because there are two obstacles to the claim that the polls were herded, both covered at Number Cruncher Politics.  The first is that in US elections, polls have been observed converging as the election approaches, but in the UK context, either the polls were herded all along, or the herding only significantly happened as the pollsters put their final polls on the line.  The second is that while the final polls appeared herded as to the question of the Conservative-Labour gap, if their primary votes for each of these parties is examined distinctly, then there is no evidence of herding to be seen.

I don't think either of these are fatal objections.  In a very data rich environment where great reliance is placed on final polls, most of the polls are well established, and everybody is doing their last poll very close to the election, it's entirely plausible herding would pop up at the death but not before.  It's also plausible that herding would be displayed in terms of the gap between the major parties, since that's where the money is in terms of predicting the outcome.   However, herding can only be a partial explanation.  The theory goes that most of the polls were getting it wrong all along, and that some of those that were closest to getting it right didn't trust their own judgement.

I've been playing around with some Monte Carlo simulations of the final polls, using 50,000 fake elections involving samples of the same sizes as the 11 pollsters.  As their average final Conservative vs Labour differences were practically zero, I just assumed that a respondent was equally likely to choose either party - but I did vary the underlying chance of choosing a major party at all between the pollsters in the simulation.

In my initial run I took the pollster's final-poll Labour + Conservative total, allowed the Conservative share to vary randomly from 50% of that and gave Labour the rest.  This meant that for each individual poll every vote taken from one major party went to another.

The results were: In 94% of these reruns at least one of my fake pollsters randomly got a more accurate difference between the major parties than any of the 11 pollsters actually did.  In 5% of these reruns at least one of them actually got the result correct or overstated the Tory margin (despite their fake methods being designed to get the margin wrong by seven points.)  In 99.45% of the reruns the variation between samples exceeded that of the actual final polls, meaning that such a herded-looking bunch of differences would only have occurred randomly about 1 in 200 times.  I wrote: "It's not absolutely conclusive, but the suggestion is that just assuming the polls were all making the same mistakes is not enough to explain the similarity in Conservative-Labour gaps in the final polls."

(I note here that my simulations don't require the assumption that the polls are a random sample of the general population (which many of the UK online polls violate).  I'm only assuming they're a random sample of some population that votes in the same way.)

When this work started actually receiving slightly more interest than it deserved, I noticed an annoying bug: for the polls that had a combined major party vote of 66, for instance, my simulation could only give 33-33, 34-32, 31-35 (etc) - always an even number difference, which increased the chance of outperforming the actual polls, among other things.  I also thought there were a few things I could do more thoroughly by using more empirical data.

So ... the semi-deluxe revised version!

I looked at the actual level of correlation between the Labour and Conservative votes in polls over the whole year up to the election, and was surprised to find it was basically nothing (you'd think they'd take votes off each other now and then, but apparently if they did, it was lost in the noise of variations between polls, random noise and fluctuations in the third-party vote).  Breaking it down by pollster or by time sometimes seemed to help, but if one such breakdown produced a result in one direction, another would produce the opposite and a third would produce nothing.  I decided to make the major party votes not correlate at all in each poll in the final simulation - which may be correct for simulating the final polls even if the party votes tended to correlate against each other through time.  (This had the effect of making the variation in the gap between the parties about 1.49 times the variation for each specific party.  As NCP point out, making the variation double, as in the USA, is too high.)

Secondly, rather than binding each poll to its final combined major-party vote, I decided that it was better to use each pollster's average combined major party vote as a base, adjusted slightly for the major party votes increasing slightly in the final polls.  Varying the votes for both major parties also meant I was no longer stuck with Labour-plus-Conservative being 66 in every fake Populus poll.

This one produced less impressive results.  Now, only 84.3% of reruns had at least one pollster outperforming all the actual pollsters, only 2.8% had one pollster getting the margin right or too high, and a mere 96.9% had the variation between samples exceeding that of the final polls.  That's still suggestive of herding, but it's a long way from conclusive, since if the polls were just all making the same mistakes, they could have ended up with such clustered Conservative-Labour differences a few percent of the time by chance.  There's also the objection that we're looking at one, perhaps the most likely, way in which polls might herd (difference between major parties) but there are other ways they could in theory have herded, but didn't.

Simulations are lots of fun but they are never perfect.  Mine ignores the same issues that come up in Australian polling again and again - some polls are too bouncy or not bouncy enough for their sample sizes, polls use scaling that interferes with the margin of error, and so on.  But I do think the case for herding in the final polls isn't completely clear.

(After the election, one shaggy dog tale emerged of how Survation apparently could have got it right in one of their checking polls, but didn't publish because they didn't believe the results. I'm not at all sure that this proves that the methods of the checking poll were the correct one, but we'll see how they go trying those methods in the future.)

2. Predicting Turnout In Voluntary Elections Is Challenging

It looks like one of the causes of the pollster-fail is that whichever polls were leading the herd expected more people to vote than did, because more people said they would vote than did, and that this especially applied to Labour voters.  UK pollsters do their best to deal with these problems by grading respondents according to their apparent likeliness to vote.  But it's always possible that in some particular election the supporters of one party are much more likely to say they will vote then not do it.  This occupational hazard can be specific to the issues, personalities and positions involved with any one campaign, so it may bite without apparent warning.

3. Most UK Polling Is Non-Random

Phone polls of any kind are a minority in the UK, with many pollsters using online panel polling similar to Essential in Australia.  Online panel polling is ultimately non-random, because no matter how the survey panel is recruited, it will not be a completely random sample of the population.  The problem of non-randomness can often be overcome by scaling, but there's still the risk that something in the nature of what attracts people to become online polling subjects is skewing their political responses, and doing so in a way that adjustments based on age, gender, location, income and so on cannot fix.

4. It Wasn't Just A Pollster Problem

A part of the outrage about the wrong polls is not just that the polls were wrong, but that analysts mostly believed them; poll aggregators generally forecast a hung parliament, with in most cases a slight overall projected seat majority for forces opposed to the Tories (Labour, SNP, Plaid Cymru, Greens etc).  Not only that but most of these sites considered the probability of a majority for anyone to be very small indeed.  This although the Conservatives had rather often outperformed their polling at elections in the past.  In 2010 they had not significantly done so, and it might have seemed that the old problems (like "shy Tory" effect, but there were others) have gone away.  But one election is not really enough to know and it seems odd (in a messy multi-party contest) that larger error bars weren't being placed on predicted seat outcomes.

There were few exceptions, excluding the usual loyalist sites that will always give reasons why the polls might be wrong in their favour and be right about that by chance once in a blue moon.  One exception was this Number Cruncher Politics article, which looked at party performance in local elections.  Much as I might quibble about whacking a quadratic curve of best fit through seven data points as in their graph, it cannot be denied that local election data had been extremely predictive in the past and was a good candidate for a signal that the polls could well be wrong.

On to Australia now and the prospects for 2016 (assuming we don't get an early election):

Why Australia Is Easier Than The UK

There are several reasons to believe the UK pollster fail will not be repeated in Australia.

Firstly, compulsory voting (or compulsory booth attendance, if you prefer to call it that) makes predictive polling here much easier.  Since virtually everyone will vote, the pollster doesn't have to anticipate whether a person who says they are certain to vote will do so.

Secondly, Australia lacks a tactical voting problem (at least in the Lower House, the Senate is a completely different story.) Because Australia has a preferential system, a voter who says they are going to vote for, say, the Greens, will not be facing the dilemma of possibly wasting their vote.  As a result, the pollster doesn't have to deal with the scenario that although the voter likes party A, at the last moment they could switch to party B since party A cannot win in their electorate.

Thirdly Australia does not yet have a high rate of non-random panel polling.  Although Essential, Morgan and Galaxy all use panel polling, Morgan and Galaxy both supplement it with random-sample polling from other sources.

...And Why Australia Could Strike Trouble

The above said, here are some concerns about the state of Australian polling heading into the next federal election.

Firstly there has been a major polling turnover.  At the 2013 federal election, the best-established pollsters were the phone pollsters Newspoll, Nielsen and Galaxy.  Each had stably operated using much the same methods for at least a few elections (in Newspoll's and Nielsen's case, longer) and these provided a good core of historically benchmarked data on which to judge the performance of newer entries such as ReachTEL and the Morgan's especially chaotic pre-election chopping and changing.

Now we have seen rapid changes in the polling marketplace.  Nielsen has been replaced by Ipsos, Newspoll is going to be replaced by a mixed robopoll/online poll run by Galaxy, and Galaxy has already augmented its own phone polling with online polling.  It also remains to be seen whether Galaxy will be continuing to produce live phone polling while doing Newspoll without it.

A little herding can be a good thing sometimes.  If less experienced pollsters find that their polls are out of step with the behaviour of the tried-and-true alternatives, they may look at why this is so and make methods changes that genuinely improve their polling.  But in the run-in to the next election, there will be a very new set of polls.  It will be more difficult to be sure that any one pollster is out of whack with polling reality because it is wrong, rather than because all the others are wrong.

That's the major concern.  The other one is the potential for the next election to simply have a "house effect" of its own (some elections do).  At present we have an Opposition with an uninspiring leader and a far from fleshed-out policy platform facing a Government whose leader is unpopular, but still "only" very strongly disliked by about a third of the population. It's not likely to be another positive enthusiasm election like Kevin Rudd's win in 2007, or even another "meh!" election like 2010.  It might well be another election where the voters dislike both of the choices, as they do in present polling.  If that happens it's entirely possible Labor support will prove soft, even up to the proverbial shadows of the post.

While Labor's recent 52%-ish 2PPs would be expected to narrowly win an election "held now", I think the changes in the polling landscape will make it more uncertain rather than less what such numbers mean if we are still seeing them as polling day approaches.  The next federal election is shaping as one where we will have to treat polling results with more caution and slightly wider error bars than normal.

2 comments:

  1. Good Article and very relevant if an election were held now.
    However if the election is held in late 2016 we may well be facing a very different economic environment which would over rule current assumptions.

    ReplyDelete
    Replies
    1. For sure. It may be that by election time one side is winning so easily that the outcome is obvious well in advance. If the polls are a few points out but pick the right winner, then people do not seem to mind as much as if the pollster error is the same but the result changes.

      Delete