Monday, July 23, 2018

Why Is Seat Polling So Inaccurate?

The accuracy of Australian seat polling has been an important topic lately, especially given the coming by-elections.  By-elections are very difficult to forecast.  Even after throwing whatever other data you like at them (national polling, government/opposition in power, personal vote effects, state party of government) they are less predictable than the same seats would be at a normal election.  So it would be nice if seat polling would tell us what is going to happen in them.

Unfortunately single-seat polling is very inaccurate.  I discussed this in a recent piece called Is Seat Polling Utterly Useless?, where I showed that at the 2016 federal election, seat polling was a worse predictor of 2PP outcomes than even a naive model based on national polling and assumed uniform swing.  The excellent article by Jackman and Mansillo showed that seat polling for primary votes was so bad that it was as if the polls had one sixth of their actual sample size.  It doesn't seem that seat polls are useless predictively, but we certainly can't weight them very highly.



What I didn't answer in that piece was why seat polling could be so poor, in a country where national polling has been highly accurate at recent federal elections.   I've had a few requests to comment on the reasons.  From the outset I have to say that:

* a combination of factors are probably at work
* I can't say which of these factors are the most likely suspects though I can downplay some of them
* the main reason it is difficult to answer the question is the opacity of the Australian polling industry.  If Australian pollsters told us more about what they do, we could know more about why they succeed and fail.

This article is a run-through of some of the possible reasons why seat polls have been performing badly in Australia lately.  Others may be added.  It should be stressed that these are in no particular order of importance. 

Scaling up

While a certain "margin of error" is frequently claimed for seat polling, the published margins of error are what we would get if we could do an exactly random sample of the whole voting population and have everybody answer.  However, polls don't actually work like that.  Some people are much more likely to be contacted by pollsters and agree to be polled than others, and people with this characteristic might be atypical of their seat.  Moreover, unless pollsters deliberately target respondents to get a mix of ages, genders and so on (which costs more because of the number of unsuccessful contacts) they will always end up with more of some sorts of people than they should have.  This can be overcome by scaling, but the effect is that some respondents are weighted much more highly than others, which makes the results more volatile than if all respondents were weighted equally.

Considering this perhaps the question shouldn't be as much "why are seat polls so bad?" as "why are national polls so good?"  We should expect polls generally to have higher margins of error in reality than what they actually claim.  But the problems are especially bad for seat polls with their (usually) smaller sample sizes, and especially given that robopolls have poor response rates generally.  Some demographics like young voters may end up very poorly represented in the sample, resulting in a lot of uncertainty as they are upscaled massively to match their frequency in the population.

(Note: Since I released this article a couple of people active in the industry have responded fingering this as the primary issue, basically saying that the reason seat polls behave as if they had a very small sample size is because, effectively, they do.)

Demographic churn

The population of voters in any seat is constantly changing (especially in inner-city seats) but the demographic data available to pollsters don't keep up.  Some seats experience rapid demographic change, and this has been cited as a cause of double-digit errors in inner-city Greens-vs-Labor seats.  It doesn't explain why some of the bigger errors we have seen lately have actually been in demographically sleepy (but politically anxious) regional seats.

Actual voting intention change

This one is sometimes offered by pollsters (especially ReachTEL in the days when they were more Twitter-active) to explain away poor results.  On this theory, while national polling intention usually doesn't move around all that much during a campaign - especially a rather uneventful one like 2016 - the national polls are the sum of sharp rises and falls in individual seats, so the seat polls only reflect the state of play when they were taken.  Policies for a party might play well in one seat and badly on another, parties and non-party actors like GetUp! might suddenly throw resources at seats they'd neglected, campaign-trail issues might affect a single seat but not the national picture, and so on.  The suggestion is that individual seats are much more volatile than the national picture so big errors are more likely.

It all sounds plausible but the 2016 data don't support it all that well.  Jackman and Mansillo found the seat polls did predict the Labor and Nick Xenophon Team primary votes more accurately as polling day approached, but not by all that much, and with no real difference for Coalition and Greens primaries.   On the 2PP front there were still massive errors in some of the polls just over a week out.  It might be that if seat polls were taken in the final days they would suddenly have smelled the coffee in Bass and Macarthur, but since key pollsters are far more likely to be flat out on their national voting intention polls at that time, it's unlikely we will ever see large numbers of seat polls so close to a vote at general elections.

The 2013 data even run contrary to this explanation.  In 2013 nationwide polls showed a continual decline in Labor's vote as the initial bounce for bringing back Kevin Rudd deflated - especially because neither Labor nor Rudd had much idea how to build a campaign around his return.  The final national polls proved extremely accurate in the context of a volatile campaign.  Seat polls taken 2-3 weeks out from the 2013 election should have been skewed to Labor compared to actual results.  In fact every pollster's seat polls favoured the Coalition compared to the outcome (some insignificantly and others massively.)

The other thing I dislike about this explanation is that it is unfalsifiable.  It completely rejects the use of the final result to consider whether a poll was accurate or inaccurate at the time taken.

Overfishing in a stagnant pool

With landline response rates to robocalling so low, voters who are actually willing to respond to polls at all and who live in key marginal seats are bombarded with polling attempts.  In a key marginal seat campaign there may be three or four times as many polls taken as are published (if not more). There are some people with landlines who are polled far more frequently than purely random sampling would suggest while others for whatever reasons don't seem to get polled (while they are home) at all.

Anecdotally some respondents respond to overpolling by pranking robopollsters with fictitious answers (since a robopoll has no way to tell if a voter is a male pretending to be female).  Probably the proportion doing so is small and they have little impact on results beyond effectively making the sample size a little lower.  But respondent fatigue could also have other impacts including reducing the willingness of respondents to take surveys at all (and those doing so might be unrepresentative in ways not captured by demographic scaling).  It might also be that even those willing to be constantly resurveyed respond differently as a result of being repeat-polled. 

Demographic errors

Pollsters often have a very quick turnaround on seat polls, especially robopolls which might be commissioned, conducted and published inside 48 hours.  It's possible for this reason that errors might be made by some pollsters in assembling and calculating data for demographic scaling (or representativeness control), especially in the case of state seat polls.  Because so little data is released by pollsters we can't verify they are getting it right, but released polls with evident calculation or tabulation errors aren't unknown.

Herding

One would expect that the first polls taken in a seat would be somewhat less accurate than those taken later in the campaign.  For seats with more than one poll in 2016 this was true, but only by a very small amount. On average, the first-time polls in such seats had a 2PP error of 3.31 points (meaning a margin of error of almost 6.5 points) while subsequent polls improved this fractionally to 2.97 points (In 15 cases subsequent polls had the 2PP better than the first poll, in 12 they were worse, and 6 were the same).  In seats that were only polled once the average error was 2.98 points.

A possible reason there wasn't more improvement is herding.  Herding normally refers to the alleged tendency of some pollsters to look over each other's shoulders and apply post-hoc corrections to polls (not by fiddling the data but by changing the weighting assumptions) to avoid releasing outlier results that might be embarrassing.  In this context, a pollster might also herd to its own previous poll to avoid releasing all-over-the-place results. For 33 cases of a repeat poll, in six cases there was no 2PP difference to the first poll in the seat, in fourteen cases the difference was one 2PP point, in four cases two, in seven cases three, and single cases of differences of 4, 5, 6 and 8 points.  The average 2PP difference between later polls in a seat and the first poll in that seat was exactly two 2PP points.

That's slightly lower than what would be expected (about 2.2 points) if there was no change in voting intention in any seat through the campaign.  In another of my Fakepoll simulations, I repeated this exercise at 200 fake elections and 145 of them showed more poll-to-poll variation than this (even though underlying voting intention never changed from 50-50 in any of those seats).  However that is not even close to conclusive evidence of poll-to-poll herding and, more importantly, there is another possible explanation for the same thing.  Which brings me to ...

Jungle juice

An ongoing mystery in Australian polling has been that some pollsters release national polls that are "underdispersed".  Given the sample sizes of the polls, the average poll-to-poll difference in some polling series is less than it should be even if there is no change in voting intention.  The most likely explanation for this is that some polls may use unpublished aggregation or non-polling inputs to reduce the bounciness of their poll and reduce the risk of embarrassing "rogue" polls.  While no pollster has confirmed that they do this, none have to my knowledge denied doing it following recent public comments suggesting that they probably do.  Also, because of the extreme opacity of the Australian polling industry concerning its methods, observers cannot directly check whether polls are being normalised or not. There is nothing wrong with trying to model voting intention through a hybrid of sampling and some form of normalising assumption (indeed, I do exactly that at election times and have recently posted about why it should work), but any pollster who does actually do this should declare it.

In the case of seat polling, a possible way a pollster would do this would be to weight results towards what they expected to be the national swing.  There are several other ways this could be done.  Again, there is no direct evidence that any pollster does this, but at the 2016 federal election, the polled 2PP swings in "public polling" (polls commissioned by media or conducted by pollsters of their own accord) were underdispersed.  This was especially so of the Galaxy/Newspoll stable, which has also been noted for the underdispersed nature of its federal Newspoll series, and which also didn't declare its recent major change in preference distributions until the rumblings of psephologists about the issue reached critical mass five months later.

At the 2016 federal election, seat polls mostly painted a picture of swings that were modest and very close to uniform.  In fact, in the seats where seat polls were most conducted, swings exceeded the national average in both size and variability.  If any pollsters were using adjustments to try to pacify noisy data, in this instance they probably made things worse.

Data matching

Errors are sometimes reported in data-matching for ReachTEL seat polls - the company sometimes polls people who do not live in the target electorate or even the target state.  The company has advised that the rate of mismatches is below 1% but I am not aware of any external auditing of this figure.  Pollsters that poll by a mix of methods (eg phone/online) may find that one of their methods is impractical for a quick turnaround seat poll, and it's been common for seat polls to be conducted by robodialling landlines only (though this may have changed in the last few years; there is not enough public information to say.)

Added: Regional variation

(With thanks to @Pollytics for mentioning an example of this.)  Within electorates there can be areas that skew one way or another for reasons not captured solely or at all by demographics - for instance the history of an issue affecting a particular town.  In a national poll these local issues cancel out, but in an electorate poll they will at least increase the variability of the results, because an over- or under-sampling of that region won't get corrected by demographic scaling.  Worse still, if it happens that voters in that area tend to get systematically over- or under-sampled (for reasons including contactability and also willingness to take part) then what you could get is a systematic skew.

Added: Forced choice on issues questions

ReachTEL seat polls sometimes contain issues questions on which the pollster forces the respondent to choose an answer to continue a survey, and does not allow an "uncertain" option.  If the respondent hangs up mid-survey, all their data are discarded.  Although the company has told me that the hang-up rate on these is less than one percent, this figure has also not been audited to my knowledge.  There is some potential for voters who find they can't answer such questions (and hence hang up) to be unrepresentative compared to other voters.

Added: Hedging

The recent mass seat poll failures in particular seats have all involved multiple polls saying a seat is going to be close when it isn't, rather than the other way around.  There are two commercial imperatives that might encourage pollsters to somehow weight in favour of close results: firstly, they make better copy for media sources paying for (or potentially reporting on) polls, and secondly, they avoid the disaster for a pollster's reputation when a pollster says the result will be 55-45 one way and it is actually 55-45 the other.  Again, this is something there is no direct evidence of, and one thing that can be said at least is that in 2016 pollsters did not release significantly more 50-50 results than 51-49s for each party.

Any more?

Further suggestions may be added and are welcome (unless they are of the type that contains the word "Murdoch" or baseless accusations of fraud.)

PS 29 July: Longman Seat Poll Fail

We have seen another seat poll failure in the Longman by-election.  The 2PPs of the published polls in Longman for Labor in chronological order were 48, 50, 49, 49, 49 and 51.  The first four were by ReachTEL and the last two by YouGov-Galaxy, with the final one of these (Newspoll) the only one to get the winner right.  The 2PP currently sits at 54.9 to Labor, though this will probably come down a bit in late counting.  This is a rinse and repeat of the 2016 mass strandings in Bass, Macarthur and (with one exception) Lindsay, all of which also had the Coalition too high.

The primary reason for this failure was having the LNP primary vote way too high and the vote for Others (minor parties excluding the Greens and One Nation) at only about half the level that they actually recorded.  This has been compounded by the preference flow in the polls being slightly stronger to the LNP than what happened in reality, but that is not the major cause of the failure.  Had the pollsters used 2016 preference flows for Longman, their preference flows would have been even less accurate than they were, but that error would have helped cancel out their error on the LNP primary vote and made their 2PP results much closer to reality.

5 comments:

  1. Well, Kevin, having read that, I think the question is not so much why are seat polls so useless but why are national polls so *relatively* accurate. Finding a truly random sample of Aussie voters, who are ready or not ready to answer in equal proportions across all parties, seems nigh on impossible.

    So bring it on in Braddon and Longman - anything can happen! (As the responses to your "How will Braddon go?" not-a-poll confirm.) And IF the "Liberals" do well in both, Malcolm should remember - even a 100% poll of a couple of not-very-typical seats doesn't necessarily indicate the level of support in all seats.

    ReplyDelete
  2. Why oh why do Journos fall for the 'internal polls' for various seats??

    how come they never ask why are the internal polls ( national) so different to the published ones?

    Well done Kev

    ReplyDelete
    Replies
    1. I call this "the unhealthy synergy". When parties give journalists internal poll data they give the journalist an easy story to write up for free. The you-scratch-my-back-and-I'll-scratch-yours is that in return the journalist usually won't ask the hard questions about the polling. It is rare for stories covering internal polls to seek any independent comment on them, though it does happen sometimes. It is common for journalists to claim that the internal polling was "leaked" when in nearly all cases it was simply given away. It is rare for these stories to contain any comment about the inaccuracy of internal polls (which are worse than published polls).

      I haven't covered internal polls in this article but there is another reason why they are inaccurate. Pollsters hired to conduct internal polls are prone to telling parties what the parties want to hear.

      Delete
  3. I think there is a huge factor ignored, if all ‘seat polling’ is treated the same.

    Seat polling leading into a byelection is a very different circumstance to seat polling leading into a general election. Take Braddon in 2018. The voters (I am one) know that the Libs remain in government regardless of the result. We can return either Liberal or Labor MP – no serious third option. There is however a strong incentive for a strategic vote for a Government MP over an Oppposition MP where the Government is already determined.

    [Thought experiment wouldn’t every seat wish to have a Government MP regardless of the brand of Government, assuming the vote in that seat did not determine Government?]

    The Labor member can be ‘punished’ for her failure to be eligible, and a voter can then immediately return ‘home’ in a general election vote which answers the implied question of “Who do you want to govern the country?”. The rusted ons will vote for their flag carrier. The genuinely varying voters a small subset of self identified ‘swinging’ voters have a chance to express frustration at the system, frustration at the exMP, or just a general conceptual preference for independents and minor parties to be a larger and more meaningful part of the parliament.

    Polling would want to be especially carefully worded in a byelection seat poll to correctly pick up the strategic, or free hit, option that voters have that is not there in a general election.

    How would you ‘see’ this effect? In sterile byelections I would expect to see a higher level of ‘1’s for candidates without a meaningful chance of winning the seat, and a different flow of preferences from those minor candidates then the same seat generates in a general election.

    As an aside.

    The prominence given in Tasmania to directed preferences is comic. The electors are Hare Clarke users. The idea that a party ‘ticket’ is going to drive substantial behaviour in an 8 candidate field (with no above the line option) is IMO ludicrous. I haven’t seen any analysis but I would be astonished if ‘how to votes’ had an impact in Tasmania in HofR. Senate perhaps because of the workload at forming a view of 80 candidates a ‘how to vote’ may be helpful to a voter.

    ReplyDelete
    Replies
    1. When it comes to Senate, how-to-vote-card follow rates were low everywhere under the new system but lowest of all in Tasmania. I expect they'd also be less effective in the Reps here but the only direct evidence I have for that is that scrutineers told me that when Labor preferenced Andrew Wilkie below the Liberals in Denison in 2013, more than half the Labor votes they saw bucked the card and preferenced Wilkie.

      For by-election polling it is extremely important to ask voters how they intend to vote in the by-election (the first ReachTEL in Longman did not do that.) I think it is also important to name the candidates.

      One might think voters would see it as to their advantage to put a government MP in in a by-election if the formation of government is not affected, but the history is that they usually don't - by-elections usually result in 2PP swings against governments. More so in government seats, but that is largely because it is usually a government MP who is taking their personal vote with them.

      As for protest voting for minor candidates, in the last 20 years in most cases where both major parties have contested a by-election, their combined vote has actually increased, and where it has gone down it has only been because of the larger number of minor-party candidates. Naturally minor candidates get much higher votes than normal in by-elections where one or the other major party does not contest.

      Delete

The comment system is unreliable. If you cannot submit comments you can email me a comment (via email link in profile) - email must be entitled: Comment for publication, followed by the name of the article you wish to comment on. Comments are accepted in full or not at all. Comments will be published under the name the email is sent from unless an alias is clearly requested and stated. If you submit a comment which is not accepted within a few days you can also email me and I will check if it has been received.