Friday, November 16, 2018

Wentworthless: Another Epic Seat Poll Fail

The failures of seat polling have been a common subject on this site this year.  See Is Seat Polling Utterly Useless?, Why Is Seat Polling So Inaccurate and How Did The Super Saturday Seat Polls Go?

The recent Wentworth by-election was difficult to poll because of a late strategic-voting swing of probably a few to several points from Labor to the winner Kerryn Phelps.  All seven polls that polled a Liberal vs Phelps two-candidate preferred vote did actually get the right winner.  But that is all the good news that there is.  In so many other respects, the seat polls for the historic Wentworth by-election, perhaps the most polled seat in Australian history, were way wrong. And like other recent seat poll failures in such seats as Bass, Macarthur, Dobell, Lindsay and Longman, the failures were characterised not just by the polls being very wrong, but also by them tending to be wrong in the same direction.  The problems go beyond small sample size, and beyond even the tendency of seat polls to be less accurate than their sample sizes say they should be.  They point to systematic errors not random ones, and in this case, I suspect, to the oversampling of the politically engaged.



Eleven poll results were published for the Wentworth by-election, though these included three internal Liberal polls for which only a 2CP was released, and also some early polling taken before the lineup of the field was settled.  Unfortunately no Newspoll/Galaxy polling was seen. Among the eight polls for which more details than just the 2CP were seen, there were various issues with incomplete figures and unredistributed "don't know" or "undecided" responses, but I've done the best that I can with those in the table that follows.  Where the ReachTEL polls provided a breakdown of "undecided" I've redistributed those accordingly, in other cases I've done it in proportion.  I've given the 2CP from one of the Liberal internals as 49.7 based on claims the party was a "fraction of a point" behind (I'm not sure the exact figure was published.)

In the initial TAI ReachTEL the IND option refers to Alex Greenwich, at the time considered a possible candidate.   For the six polls that gave 2PP/2CP figures I've found the proportion of all preferences that would flow to the Liberals vs Labor and vs Phelps to get the numbers published in the poll. 

Here's the table (click for larger clearer version):


On the right hand side is the average error on primary votes, in most cases for Liberal, Phelps, Labor, Green. Heath and Others, but in some of the early cases for whatever they had (with the comparison points adjusted accordingly).

The key points:

* Excepting the Licia Heath uComms-ReachTEL, every poll underestimated the Liberal primary, by between 3 and 9.6 points.  On average, polls were wrong on the Liberal primary by five points.

* Every poll underestimated the Phelps primary, all but one by more than five points.  On average, polls underestimated Phelps by nearly 8 points.  This was significant because although she won the seat, even two weeks out most of the polls did not have her second on primaries.  She finished up second on primaries by 17.7 points!

* Every poll overestimated the Labor primary, all by over five points and most by over nine points.  On average polls overestimated Labor by 10.5 points.

* Polls also overestimated independent Licia Heath, because they named her in the readout while the remaining minor candidates were not so named.  Heath turned out to have little actual voter appeal, polling only 2.3%.

* All six polls that polled a Liberal vs Phelps 2CP found the correct winner.  However, all bar one overcooked the margin.  There was an especially severe miss by the Greenpeace ReachTEL which was wrong by 11.2 points.

* Two of the five polls that polled a Liberal vs Labor 2PP found the correct winner, but all the polls that polled this were wrong by at least 8 points, with Voter Choice wrong by 16 points.

* Every poll that published respondent preferences underestimated the flow of preferences to the Liberals' Dave Sharma against both Labor and Kerryn Phelps.

* The average error on estimates of the primary vote for any contender was a massive 5.7 points, nearly double the errors for Longman, even though the errors should have been smaller because the polled candidates included Heath (who polled very little).

The errors concerning the Phelps and Labor primaries were to be expected.  During the last week of the campaign, strategic voting as a reason to vote 1 for Phelps took off to a degree rarely seen in an Australian contest.  So one would expect Phelps to be too low and Labor too high in earlier polls.  Whether to the degree shown is another question.

However, looking at the progress of the campaign, the polling errors in the 2CP and 2PP votes, and in the Liberal primary, should have gone the other way if anything, suggesting that the polls were even further out on these aspects at the time they were taken than at the end.  The reason for this is that the Liberals endured a horror final week, with issues capable of embarrassing them springing up on a much more than daily basis.  This was reflected in the higher than usual difference between pre-election and on-the-day voting, and also in the unprecedented size of the turnaround in Phelps' favour from the early-received postals to the later ones.  Probably at the time the earliest polls had Phelps just beating Dave Sharma, she wasn't, and probably Labor were never even remotely in the hunt on 2PP.

It's plausible that some voters who decided to strategically switch from Labor to Phelps also then decided to follow her how-to-vote card.  This would have had the effect of changing their 2PP vote from Labor to Liberal.  However, I don't believe there would have been 10 points worth of voters switching their vote from Labor to Phelps in total, let alone switching to Phelps and then deciding to follow her card.

Seat Polls Need Health Warnings

This is yet another case of seat polls being grossly inaccurate, yet it didn't stop the usual mass of uncritical reporting of their contents in the media.  If you're a journalist and you want to inform your readers, every time you cover a seat poll, you should mention that seat polls have not performed reliably at recent elections.

3 comments:

  1. Interesting.
    1) I have been door knocking and I can't get over how disengaged people are. Disengaged voters in a Liberal electorate vote Liberal is I would have thought a good hypothesis.
    2) The polls not seeing the Phelps thing is a bit harsh. I was looking at the the google search data for strategic voting before the vote. The count was a lot higher than 1 or 2. On the day the labor posters was pretty much pushing that option ( put liberals last).
    3)I do wonder if people just get sick of the pollsters and hangups and bullshit swamp the result.

    ReplyDelete
  2. "If you're a journalist and you want to inform your readers, every time you cover a seat poll, you should mention that seat polls have not performed reliably at recent elections. "

    There's your problem right there. Why would you think journalists are trying to inform their readers?

    ReplyDelete
    Replies
    1. I talk to a lot of them, some do try hard to get it right.

      Delete