Saturday, June 29, 2019

What Might 2PP Voting Intention Have Really Looked Like In The Last Federal Term?

The 2016-2019 parliament saw Australia's worst failure of national opinion polling since the early 1980s, a failure that was not just a combination of normal errors and a reasonably close election.  Aggregated polling had the Coalition behind for the entire term, at no stage better than 49% two-party preferred, and yet the Coalition won with 51.53% of the two-party preferred vote.

The view that the polls were in fact right all along but voters changed their minds at the last moment (either on election day, or on whatever day each elector voted) fails every test of evidence that it can be put to.  The difference between voting intention for voters voting before election day and on election day is similar to past elections, and if anything slightly stronger for the Coalition.  There was no evidence in polling of change in voting intention through the final weeks, as would have been expected as voters who had already voted reported back their behaviour if the polls were at all times accurately capturing the intentions of the person being polled.  Also if those who had already voted had shifted towards the Coalition as they made their final decisions while those who had yet to vote were yet to do so, there would have been polling gaps of several points between those who had already voted and those yet to do so; this was not the case in the released evidence either.  


It is therefore overwhelmingly likely that the polls went off the rails sometime before the last few weeks.  But when, and did this happen suddenly, over time, or some combination of the two?  What might a polling aggregate of the term we have just seen had looked like had the polls been accurate at all times?

To start with, this may well be something we will never really know with that much certainty.  But it's worthwhile trying different models and seeing which ones are open to obvious objections and which ones are possibly OK.  

Existing Retro-Aggregations

The first retro-aggregation I have seen is in a set of conference slides by Simon Jackman and Luke Mansillo.  You'll have to use the right arrow key to scroll along to slide 9 which is the 2PP retro-aggregate.  Their assumptions include an unknown discontinuity for the Turnbull-Morrison leadership switch and also that each pollster has a time-invariant house effect.  In slide 11 they find all the pollsters to have quite large house effects on this basis (from just over 2 to just over 3 points 2PP.)

The result of the Jackman and Mansillo retro-aggregation shows the Coalition getting a small bounce up to 51-49 just after the 2016 election, then dropping back to below 50% at the end of 2016, and then remaining narrowly behind for a year and a half.  In the lead-up to Super Saturday, the Coalition briefly grabs a slender 2PP lead, then drops in the leadup to and further as a result of the spill, and falls to a term low in the high 48s as Scott Morrison arrives in office.  The Coalition climbs rapidly from late 2018, takes the 2PP lead in early 2019 and is then never headed.

Overall a remarkable aspect of this retro-aggregate is the lack of variation in the Coalition's standing.  The government sits in a roughly two-point band for nearly all of the term.  

(One minor data point I'd make here concerns the treatment of "YouGov".  They record 15 YouGov polls but I am aware of 16, these being 13 in 2017 by YouGov-Fifty Acres and three in 2019 by YouGov-Galaxy.  I would have treated these separately, as they are completely different pollsters that were operating in conjunction with (and in the latter case after being acquired by) one global polling firm.  It's also obvious from the assessment of house effects to the YouGov series that Jackman and Mansillo have ignored YouGov-Fifty Acres' idiosyncratic respondent 2PPs and recalculated 2PPs for this series.  I agree with this approach.  While it's true that Fifty Acres often showed Coalition leads, it did so off primaries that would have seen the Coalition belted.)

The second retro-aggregation I've seen was a set of 2PP aggregations posted today by Mark the Ballot.  (Also see another earlier Mark the Ballot post here.) These Bayesian models are "anchored" to one of the 2016 or the 2019 results, or both.  The 2016-anchored model (left-anchored) has the Coalition behind for virtually the whole term and sneaking over the line at the end, though the final estimate for the Coalition is lower than what actually occurred.  The 2019-anchored model (right-anchored) has the Coalition ahead for most of the term, except in the aftermath of the spill, but has them jumping to 52% in the aftermath of their 2016 win, which doesn't make sense to me (see below).  The model anchored to both sides avoids both of these issues and is generally very similar to the Jackman and Mansillo model, except that it finds a larger drop on the switch from Turnbull to Morrison.  

Retro-Aggregation With Morrison As The Culprit

I am going to present some retro-aggregation graphs that make a different assumption to any of these.  I don't claim this assumption to be an established fact, just a very plausible hypothesis.  The question I'm interested in is: what if the change of Prime Minister was a major cause of the polling failure?  

I think there are some sound arguments for this having been the case.  The previous election was contested between Malcolm Turnbull and Bill Shorten.  Final 2PP polls were very accurate at that election, albeit herded in the final week specifically.  Turnbull and Shorten continued as leaders after the election.  The polls immediately after the election made sense, because Turnbull's result was viewed by Coalition supporters as a clear disappointment, and there was no reason for him to get a polling bounce for a disappointing result.  So why should we think a house effect was suddenly present in the early polls of the term having not been evident in the late polls of the previous one?  The assumption of an invariant house effect through the whole 2016-9 term just doesn't make sense. Whether one gradually developed through the term is another question.  

Secondly, if pollsters either over-captured educated voters or under-captured disengaged voters (two likely possible causes of the polling failure) there's every reason to think they would have done so under Morrison specifically.  Thirdly, we did have some electoral events during the Turnbull term, and the Super Saturday results taken together at least didn't really suggest Turnbull's government was cruising (though they are messy to interpret because of their unusual circumstances).  

The change to Morrison was unusual compared to previous Prime Ministers.  Previous mid-term changes of Prime Minister have always brought dividends for the government changing its leader, though the benefits of the controversial change from Kevin Rudd to Julia Gillard in 2010 were small and fleeting.  The leadership change in 2018 was extremely messy and public anger concerning it was not surprising - but on the other hand Malcolm Turnbull was not a very popular PM.  

There's nothing implausible about the idea that a leadership change should produce house effects.  Indeed changes in the house effects of polls relative to each other are often seen following leadership changes (Morgan vs everyone following the installation of Turnbull, and Essential vs Newspoll following his removal).

My models below (using the same relatively simple aggregation methods as I used for the term, except for changing the global house effects) are anchored to the 2019 result at the end, but assume that the polls were accurate until Turnbull was removed.  The first one assumes that they then went haywire immediately:

In this case the leadership change (at about week 105) actually has no immediate impact on voting intention (the downswing before it is caused by the bad final Ipsos for Turnbull).  The government is competitive but trailing within weeks of Morrison's arrival and starts 2019 on level 2PP terms, taking a clearly winning lead in the campaign.

This one assumes that a house effect developed gradually once Morrison became PM:


Here we have the same polling blowout caused by the spill as in the original aggregate, but it is followed by a rapid recovery that runs (with the odd stall) all the way to polling day.

This one is a hybrid of the two, with half the polling error happening as soon as Morrison becomes PM, and the other half developing gradually:


And this is one where half happens as soon as Morrison becomes PM, and the other half happens during the campaign (perhaps being missed by the late polls because of herding):


These versions all show the government clearly in the lead only in the last month or so of its term.  They vary as to whether it was more or less equal before that - with the caveat that the strange distribution of seat swings seen at the election means the government could actually have won in minority with 48.9% 2PP or perhaps even 48.5%. On the most optimistic of these assessments, the government was actually on track to a scraped re-election from the moment Morrison took the helm; on the least it was losing until early April.  

I post these because it's tempting to conclude from the polling failure we've seen that Malcolm Turnbull was removed in an election-winning position.  But I don't think that should be accepted as hard fact.  We pretty much know the polls were wrong at the end.  We have no reason to think they were wrong at the start.  They may have been getting worse through the term, but is there a way to say where?

I hope this article will encourage other retro-aggregation attempts and discussion about sources of evidence about the point(s) at which the polling failure may have developed.

7 comments:

  1. At least as plausible as any other explanation, Kevin, but I'm a bit puzzled by your reference to changing the correction for house effects. How can you see evidence of house effects, and correct for them, at least in the last few months when the pollsters were all desperately herding?

    ReplyDelete
  2. I am not convinced any of the pollsters were herding to each other before the last 3-4 weeks. Jackman and Mansillo pick up signs of 2PP "underdispersal" going back to at least March but I suspect this is mainly caused by Newspoll's tendency to not move about much from poll to poll (which seems to be a form of self-herding, perhaps caused by some kind of stability correction).

    I think there is some evidence that there was probably not a big change in voting intention in the last few weeks when the string of very clustered results occurred. Had there been a really big change there would have been something unusual in the relationship between voting before the day and voting on the day, but the gap was similar to previous years. So I suspect most of the error existed before the last few weeks. I don't think that it can be a case of the polls being right until 3-4 weeks out, then suddenly someone puts out a wrong value and everyone else herds to it. (Indeed for that to be the case there would have had to be a very dramatic flip in voting intention partway through the campaign, and that's not something that should happen.)

    However we can't prove that the house effect was, say, 2 points six weeks out. We can only look at a range of plausible values and see what kind of picture they produce.

    ReplyDelete
  3. Hmmm. So,supposing a sizeable bloc of voters who think how they vote is nobody else's business (say, a certain kind of uptight churchgoer) all jumped back to the Libs immediately after ScoMo's ascension, there's no way we'll ever know?

    ReplyDelete
    Replies
    1. It's going to be very difficult to know, at any rate.

      That's probably the worst case - that unreachable voters broke in a way not predictable by their demographic characters. A better case would be that the pollsters weren't scaling for the right things and had the evidence all along if they knew how to use it correctly.

      Delete
  4. I find myself wondering if there were not one, but many contributors to a cumulative bias which, through herding, then contaminated the thinking of other pollsters. There were any number of landmark events during that term of government, including the one you have used, but also including the S44 sagas and the Barnaby Joice dramas. If, at each of these crisis moments, conventional wisdom suggested a larger drop in voter support than expected, those conducting the polls may have concluded that their sampling was under-reporting pro-labor sentiments and incorporated a small (but cumulative) correction. This practice would be confirmed in pollsters minds by the by-election results, and in particular by the massive swings recorded. If anything, they are likely to have encouraged them to double down on the practices. Then comes the spill, and it's aftermath, the negative impact of which was then overestimated, and the rest is history.

    An alternative which suggests itself as marginally possible is demographic drift in the case of a number of voters, whose voting intentions subsequently changed accordingly. If those individuals were then erroneously categorized during polls as still being within their former demographic, it could result in incorrect scaling, which then produces an erroneous poll. This would amount to a social trend that was incorrectly captured and processed by the pollsters, just as there is a trend amongst neutrals within the US to bias more liberal because of dislike and disrespect for President Trump, while his base support remains firm. Such a situation in Australian terms could apply.

    ReplyDelete
  5. I remember reading, though I couldn't tell you where, that Crosby Textor actually had the result right (or close to it), and this is why Morrison was confident he could win. If this is so, then someone at CT might well be able to say when the public polls diverged.

    ReplyDelete
    Replies
    1. Mark Textor's is one voice we have not heard so far about the causes of the polling failure and his comments would be interesting, especially as he has been quite scathing about public polling in the past. The problem as usual with internal polling is that it is more or less impossible to validate how much an internal pollster really did have it right - and of course the pollster will want everyone to think that they were accurate.

      Delete