1. This article uses historic data to examine how good preference flows from the previous election are at predicting preference flows at any given federal election.
2. Last-election preferences have been an industry standard following various failures by and problems with the respondent-preference method, especially its total failure at the 2004 election.
3. Using a model of minor-party breakdowns similar to that used by most pollsters, this article looks at expected versus actual preference flows at all federal elections since 1955.
4. The quality of data available on preference flows is better for elections from 1983 onwards.
5. Since 1983, last-election preferences have predicted preference flows remarkably well at most elections.
6. However, last-election preferences substantially understated the flow to Labor at two elections in this time: 1990 and 2013.
7. The historical record of last-election federal preferences is so strong that claims that last-election preferences in polling are wrong should generally be treated with great caution (including now).
Welcome back to Wonk Central, the irregular series in which I turn a statistical blowtorch on some obscure corner of psephology to the delight of a few aficionados while everybody else goes "what?" But this time we're hunting bigger game than in the previous three instalments, since this piece concerns one of the most debated issues in Australian polling: whether preferences from the previous election are a good way to predict how preferences will flow at the next one. This article is long and has a lot of numbers and a big spreadsheet; it comes out at an author-allocated 4/5 on the Wonk Factor scale.
Preference-prediction is a pretty big deal at the moment, with the gap between respondent preferences and last-election preferences currently running, on average, at over a point. If last-election preferences are correct then while the Coalition is well behind (currently at 46.6% 2PP according to my aggregate), then it's probably only a couple of points off scraping back in at a snap election, off the back of the advantages it has based on personal votes in seats won last time. If, on the other hand, respondent preferences are where it's at, then the Coalition could be slumming it around the low 45s, and in no position to be competitive for quite a while. (NB: This article was written the day before Tony Abbott was removed as PM, and you can see I didn't see that one coming so quickly!)
Why Most Pollsters Don't Use Respondent Preferences
It might be thought that the best way to find out how a minor-party or independent voter will distribute their preferences is to ask them. Unfortunately, for whatever reason, it's not that simple. Some voters say they will preference one way and then end up preferencing another way when they are given their party's how-to-vote card (I think some voters even fear that if they don't follow their party's card their vote might not be counted). It's also possible that polled support for "others" overestimates those who will actually vote, say, independent and preference Labor, and underestimates those who will vote for right-wing micros that preference the Coalition. It may also be that, for whatever reason, the inclination to preference Labor is softer.
The seminal failure of respondent preferences happened at the 2004 election, at which John Howard's Coalition seemed to be in a close battle with Mark Latham's ALP but ended up increasing its majority. Newspoll, Morgan and Nielsen were all using respondent-allocated preferences and were expecting between 69% and 74% of all third-party preferences to flow to Labor, compared with 60.7% at the 2001 election. The actual figure was barely changed at 61.0%. Green preferences had flowed more strongly to Labor in 2004 as a result of Mark Latham's forests policy, but the preferences of other minor parties had been weaker than in 2001.
Newspoll and Morgan, which had the election too close to confidently call with 2PPs for the Coalition of 50 and 49 respectively, came a cropper as the actual 2PP blew out to 52.7, while Nielsen (which had 54 in its final poll) was largely saved by having the Coalition's primary vote much too high to begin with. Galaxy (with a final poll of 52:48 based on last-election preferences) easily took the honours for the best final poll, and respondent preferences were banished to a very dark corner of the Newspoll office, never to be seen in there again.
In 2007 Nielsen again overestimated the preference flow to the ALP (though Morgan didn't). In 2010 the average error of respondent preferencing wasn't such an issue, but a further criticism emerged: that of volatility. If last-election and respondent preferences were on average saying about the same thing anyway, then why let the bounciness of a sample of, say, 300 minor party voters blow your poll's 2PP estimate off course by a point or two by random error?
It's only recently that this debate has been other than one-way traffic. In 2013 preferences did flow more strongly to Labor than at the previous election - not as strongly as respondent preferences were implying, but somewhere in between. The 2014 Victorian state election was more or less a carbon copy of that. This was followed by two 2015 state elections (in Queensland and NSW) in which last-election preferences from Labor's previous drubbings were wildly inaccurate. However, those elections were held under optional-preferential voting and saw massive primary vote swings.
I don't have a very long historical record from which to compare respondent-allocated and last-election preferences, because 2PP polling is a fairly recent thing. However, it's possible to assess the accuracy of the last-election model all the way back to the second election held using preferential voting, back in 1922.
For this article I've decided not to go quite back that far. Instead I've decided to take the modern preferencing problem as starting from 1955. At that election the Australian Labor Party (Anti-Communist) - later the DLP - created mayhem by polling just over 5% at Labor's expense but with most of its preferences going to the Coalition. Although the Coalition obtained only a 0.8% primary vote swing, there was a 4.2% swing to it on a 2PP basis, and it gained eleven seats.
Elections from 1955 on, except for a brief lull in the mid-1970s, have always had at least one substantial third-party presence. Significant third parties, that at their peak tended to poll at least 4% of the vote, have included the DLP/ALPAC from 1955 to 1972, the Democrats from 1977 to 2001, One Nation in 1998 and 2001 and the Greens from 2001 onwards. In 2013 the Palmer United Party became the latest member of the group, although its tenure there looks likely to be brief.
At the moment most pollsters are polling separate voting intention figures for the Greens and Palmer United. Newspoll is not separating Palmer United, and Morgan is polling separate figures for KAP as well. I've taken including the Greens and PUP as a standard approach and asked the following question:
Suppose at each election, a pollster separately polled primaries for Labor, the Coalition, and any minor party that had polled at least 4% at the previous election. Suppose further that the pollster was very accurate in all their polling, and had a way to exactly predict preference losses in contests involving multiple candidates from the same side. If this pollster used last-election preferences at all times, how accurate would their predictions of the national 2PP vote have been?
I've also made one exception to the 4% rule to allow the Democrats to remain in the experiment for 1996 after polling 3.75% in the House in 1993. I've made the odd judgement call about whether some irregular major-party-like entities (such as "Independent Liberals") should be counted as part of the Coalition and Labor votes or not.
It's been easy for a while to do this sort of comparison for the last few elections, since the AEC site has complete 2PP preference flows for all parties for the last four elections. These were also computed for 1996-2001, but I don't have those figures. What does exist for 1983-2001 is the output of completed preference throws in every electorate, such that all votes finished up with the final two candidates (normally Coalition vs ALP). A pollster could have used these throws to estimate the flow of preferences for each party, with some noise because of votes that flow through multiple minor parties before reaching a major party.
For elections before 1983, all that exists is preference throws up to the point where somebody won. If a candidate polled over 50% on the first ballot in an electorate, then no preference data from that electorate was retained. This means the estimate a pollster could have made of last-election preferences would have been a lot rougher. It also means that even if the pollster got their estimate dead right, it might still have ended up looking wrong, based on which votes contributed to the overall 2PP estimate for the next election.
Until last week, answering my question above would have been a massively tedious task involving hundreds of hours of calculations. However, thanks to David Barry doing nearly all the work for me with his fantastic Election Statistics resource, I've been able to knock this one over in just a few hours of calculations.
(Click for a full-size version)
The above spreadsheet requires a lot of unpacking! At each election, as well as the generic "Others", there were from zero to three minor parties that would have been on the pollster's radar by the method I've outlined. I've kept each party in the same column at each election it appeared at, hence the first column being blank for elections from 1975 to 2001.
For each party or Other, the vote given is their primary vote share at that election. "Labexp" is the share of preferences (expressed as a decimal, not a percentage) Labor would have expected from that group based on the previous election. "Labact" is the share of preferences Labor actually got.
The column ALL gives the total vote for all minor parties (as defined by me) at each election. The rightmost "Labexp" then is Labor's expected share of all minor party preferences based on the vote levels recorded by the given groups, and "Labact" is Labor's actual share. So, for instance, we can see that in 2013 Labor would have received about 57% of preferences based on previous preference flows and the breakdown of parties, but actually got 61.6%.
The proportional difference caused by changing preferencing patterns is given in the "Diff" column (if it's negative, this means Labor did worse than expected). The "Error" column gives the amount by which the pollster's 2PP would have been wrong. The "Round" column rounds this to one decimal place. Lastly, because the share of minor party votes was at a record high in 2013 and may well be similar next election, the "Norm" column gives the rounded error "normalised" to how high it would have been had minor parties been as popular in past elections as now. Oh, and I've shaded the pre-1983 elections because of their weaker preference data quality.
If anyone spots any obvious errors, let me know.
Last-Election Preferences Performance
The spreadsheet shows that in the last twelve elections, my imaginary last-election pollster would have done very well. They would have got the 2PP right to within one decimal place three times, right to within 0.1 points three times, and right to within half a point another three times. In both 1996 and 2001 modest shifts in Democrat preferencing patterns would have produced a modest but still easily acceptable error. Really significant errors would have happened only in 2013 (Labor overperformed by 1 point) and 1990 (Labor overperformed by 1.1 points).
The average error for all these elections was just 0.3 points (0.4 if normalised to 2013-level minor-party voting). To get such an average error using random-sampling respondent polling even with every respondent saying they will do what they actually do, would require sampling of about 15,000 minor party voters per election. On that basis, over the last 30 years, respondent preferencing has generally had no hope of competing.
1990 is well known as the year in which Labor's preferencing strategy came of age: the Hawke government and especially Graham Richardson strongly courted the green left, resulting in a stronger flow of preferences both from Democrats and Others, and saving Labor from a sickly primary vote. The main cause of the increased preference flow from Others was that the Others category now included a 1.3% vote for the Greens.
In 2013, Greens preferences became stronger for Labor partly as a result of Labor having governed in alliance with the party, but also because of Green dislike of the Coalition's policies under Tony Abbott. However that was only half of the story: Labor's share of Others preferences also increased. This happened partly because those Others that had also contested the 2010 election preferenced Labor at a rate about 6% higher than before, and partly because they were joined by many new parties (primarily Palmer United) that were also fairly neutral in their preferencing habits. There was also a change in the mix of independents, away from rural conservatives.
If we look at elections before 1983, generally the projected errors are larger (an average error of about 0.6 points, or 1.5 points when "normalised" to today's abnormally high levels of minor-party voting). Partly this would be because the data are less reliable, but there are also some obvious realignment elections in which a party with no previous standing and a distinctive preference flow suddenly appeared. This happened in 1955 with ALPAC and again in 1977 with the Democrats. Most likely it would have been completely obvious at these elections that preference flows would be different.
What is interesting is that the last-election preference model used by pollsters not merely survived but thrived at some elections that might have been expected to wreck it. The sudden rise of One Nation in 1998 is a good example of this. While some people believe that One Nation messed up the preference flow at that election, in fact the modelling of One Nation as generic Others happened to work just fine.
Going Into The Next Election (Updated 15 Sep!)
What I've shown above is that in recent elections, last-election preferences have been very reliable. However, they shifted substantially in 2013, and the past record shows that it's possible for preferences to move in the same party's favour two elections in a row. The history of DLP preferences (which at one stage reached 87.7% to Coalition) also suggests that the Greens' current 83% flow to Labor has not necessarily maxed out. Originally, this article suggested that an Abbott prime ministership was likely to strengthen it further, but now that Abbott has been replaced by a relatively green-tinged Liberal, it could even be the flow goes back the other way.
One reason to think preferences could shift in Labor's favour is that the Nick Xenophon Group is expected to mount a large campaign in South Australia, and is capable of capturing 2% of the national vote in that state alone if all goes well. It's likely Xenophon preferences would flow slightly to Labor. That said, the change away from Abbott may prove to have taken some wind out of Xenophon's sails.
All the same, there have been cases in the past (such as 2004) in which big changes in preference flow were expected, were shown in respondent polling, but didn't materialise. There are no known cases where respondent polling disagreed with last-election polling and was more or less correct, though there might have been had such polling been in use in, say, 1977. It's historically much more likely that the preference flow at the next election will be either consistent with last election, or somewhere between last-election and respondent preferences, than that respondent preferences will prove entirely accurate.