|(Image lambied from a widespread internet meme of unknown (to me) origin, example here)|
At the last federal election both the well-known BludgerTrack and my aggregate produced a final 2PP result of 53.5, which turned out to be spot-on.
Since then I've been keeping an eye on the differences between the two. Some differences are to be expected from time to time because of differences in method. My aggregate - designed to be easy to update quickly - has generally simpler maths, is slightly slower-changing, usually ignores sample size in favour of overall pollster-reliability ratings, has a different response to the foibles of Essential, doesn't use state breakdowns and sometimes differs in whether or not a given poll is used. Also, mine is publicly updated all the time while BT has weekly updates.
It's possible if you're not careful to get sucked into a spiral of incorrect assumptions and end up with a badly skewed result when doing aggregation (especially with so few active pollsters at the moment) and I'd be especially concerned if my aggregate was becoming too Labor-skewed, given Labor's tendency for soft performance compared to polls at actual elections. Recently there was a run of six weeks in which BT ratings were 0.6 to 0.8 points lower for Labor than mine in five of the six weeks. I just thought I'd check if there was any lasting divergence between the two. Another issue was when BT moved by 1.2 points in a week and my aggregate moved by only 0.1 at the same time. Both were using the same data. This doesn't necessarily mean that one approach is more wrong than the other, or say whose it is if so, but it did trigger some thought about how my aggregate might be improved.
So I took the released BT weekly readings, and my recorded end-of-week reading, and compared them. For results for weeks in which both were active, here's what the comparison looks like.
Series 2 is BludgerTrack and series 1 is mine. There are now and then big one week gaps (a ReachTEL may arrive late in the week, or there was the "split Morgan" sample that the two systems treated differently). On the whole, though, the average of the two across this time differs by virtually nothing. BT has a higher average week-to-week change (about .78 compared to .5).
The recent differences may have been partly caused by our treatment of Essential, which has recently been on an ALP-friendly run. William readjusts Essential's house effect on a dynamic basis; I just downweight it (and not as much as I reasonably could.) But another possible factor is the way different aggregates respond when a poll has primaries that are different to what would be expected from the pollster's released 2PPs.
Extreme Newspoll Rounding
(A new TV series starring Robson and Antony Green)
Most polls (Newspoll, Essential and Galaxy) release both primary votes and 2PPs rounded to the nearest whole number. They do this partly because it looks neater, and partly because a change of a fraction of a point isn't significant and nobody wants to read Dennis Shanahan talking about a 0.2% increase in Tony Abbott's rating, ever.
But for those of us trying to model where the 2PP might be at any given time, the loss of detail can be frustrating. When a band of one 2PP point can span the difference between a narrow defeat and a pretty handy victory, the tenths do matter. And if we could know whether a 51 was really a 50.6 or a 51.3 all the time, we could make our modelling better. Even if it's only by a few tenths of a point, and there are larger sources of possible error, it's better to eliminate as many as we easily can.
The fact is that while there isn't a statistically significant difference between a 50.6 reading and a 51.3 reading, the two readings imply different probability ranges and they are different data - a difference that we lose when polls round to the nearest whole figure. This is why I like it that ReachTEL releases primaries to one decimal place, and I think the objections to doing so are based on a gross misunderstanding of the concept of significant figures.
And this is also why it bugs me that, a little knowledge of "margin of error" being a dangerous thing, some people routinely dismiss poll-to-poll changes that are within the MOE of a given poll as necessarily meaningless. A two-point shift in Newspoll, for instance, may well be meaningless bouncing against a backdrop of no real change, but it is better to get that shift than not. All else being equal a party that receives a two-point increase is more likely to have increased its overall vote share than decreased it, even if the evidence that it has changed at all is inconclusive.
The Newspoll before last (as discussed last fortnight) was a classic example of primary vote vs 2PP difference, because the primaries implied something around 50.3% to Labor but the rounded 2PP result was 51. This week's Newspoll (41 Coalition, 34 Labor, 11 Green, 14 Other) is an even more striking example since the primaries imply about 50.2% to Coalition but the rounded result was 51 to Labor again. How is this one even possible?
Last week I mentioned how the use of rounding of the primaries can create distortion in the 2PP we would expect from the released primaries. If the ALP and Green primaries were really almost half a point higher, and the Coalition and Others primaries really almost half a point lower, then the actual 2PP from those primaries would be expected to be up to 0.68 points better for Labor than if the primaries were the exact released figure. So if adding that amount to the implied 2PP score for the primaries gets the result for Labor to 50.5, then it's possible the real 2PP just scraped over that score, and was then rounded up. (Another possible scenario is that, to get to a total of 100, one result is rounded by more than half a point with the other three rounded in the other direction, but in this case the maximum difference between the actual 2PP and that implied by the primaries appears to be slightly lower.)
But in this case depending on Newspoll's exact 2PP formula, getting from 49.8-ish to 50.5 by that method is a bit of a stretch and either very unlikely or perhaps not even quite possible. But a greater discrepancy between the poll's real 2PP and that implied by the primaries is possible for another reason to rounding: state differences in preference flow.
Many poll-watchers know that 83% of Greens preferences went (or if distributed, would have gone) to Labor ahead of the Coalition in 2013, but about 53.7% of PUP votes had the Coalition first, as did 53% of votes for "Others" (candidates excluding Coalition, Greens, Labor and PUP). But readers might not be aware that these vary quite a deal by state.
The Greens flow to the Coalition last election was 20.6% in WA and 20.1% in SA, but only 13% in Tasmania and 9.2% in Vic. The PUP flow ranged from 48% to Coalition in Tas to 55% in Queensland and even 58% in the NT. Small numbers of Nationals preferences were distributed in some states but not others (where this happened, it ranged from 69% to Liberal in NSW to 78% in SA. There were also a few Liberal preferences to National in Victoria).
The big difference is Others, because the major contributors to the Others pile vary greatly from state to state. which ranged from 39% to Coalition in the ACT (where the Others voters were voting for Bullet Train) through 43.1% to Coalition in Tasmania (where the Others voters mostly voted for Andrew Wilkie) to 57.2% in New South Wales (Christian Democrats a big factor there) and 62.2% in Western Australia (where the most common Others vote was Australian Christians). In Victoria, the Others split is only 47.2% to Coalition, as Cathy McGowan and the Sex Party more than cancel out Family First.
So while we may think of static last-election preference flows of 17% from the Greens and 53% from Others to Coalition, the reality is that the pollsters (depending on how detailed they want to get) have the data to distribute preferences by state. If, for instance, the Greens are polling better relative to their election result in Victoria than they are in the rest of the country, then it's likely the modelled flow of Greens preferences to Labor will be higher than the election value.
I don't have detailed knowledge of which pollsters do and don't use this sort of trick, but I mention it to show that just because we can't get a 2PP from the released primaries using the national preference flows, doesn't mean that 2PP is wrong. That's important not just for shooting down claims that a given poll result is impossible, but for setting the margins of error for the next section.
[NB: Adrian Beaumont has suggested yet another way preference flows can differ from what we might expect: a given pollster might ask respondents for Others which Others they have in mind, but then not publish the breakdown. For instance, voters for Independents put Labor ahead of the Coalition 57.1% of the time in 2013, so if an unusually high percentage of the Others tally claim they would vote Independent, then that can lead to a stronger preference flow for Labor.]
Different Ways Of Dealing With Rounding
Prior to now, I've dealt with pollster data truncation in a very simple way. My aggregate has simply taken the 2PP released by the pollster and aggregated it. It's very easy to describe the error potential of this method for a given reading: any value between half a point below and half a point above is just as likely. So the average error for each given poll result (compared to the sample's actual 2PP based on last-election preferences) is 0.25 points and the maximum possible error for each given reading is 0.5. Over the whole set of polls used in the aggregate at a time, the average error in the 2PP readings will usually be very low, because some will be out in one direction and some in another.
An alternative method - as used by Bludger Track (see confirmation and clarification here) - is to aggregate off the primaries of various polls (after adjusting them for whatever house effects and bias adjustments one likes), then get an aggregated primary total and convert it to a 2PP (either on a national basis or with some allowance for state issues). In this case, though a 2PP for a specific poll never actually appears in the calculations directly, each poll can be thought of as having an implied 2PP that will contribute to the eventual overall result.
In this case the maximum possible error for a given poll reading is quite a bit higher than 0.5. For Newspoll it's at least the 0.68 implied by rounding, but there's also the possibility it could be more, because of state distribution issues combined with rounding. But the distribution of possible errors will have a degree of central tendency and it could well be that the average error is less than 0.25, despite the risk of larger outliers.
So which method is the best? I've thought about this for a while and decided that the answer is neither. They both throw away information. If we have a set of primary figures that imply a 2PP of 50.2 for a party and the published 2PP is 51 then allowing the primaries to carry an implied 2PP of 50.2 through the system isn't correct, because we already know it can't be that low. On the other hand, it's extremely unlikely the 2PP is really 51, and more or less certain it isn't over 51; it's far more likely that the 2PP for that poll result is somewhere between 50.5 and 51, and much more likely closer to the former than the latter.
So if I can just scribble some charts quickly on the blackboard over here (click for larger version to see how untidy my handwriting really is) :
The combined method gives the correct probability distribution for the 2PP of the poll, while both other methods treat impossible values as possible and give a mean that is very unlikely to be the real value in one case and incapable of being it in the other. The mean derived from the combined method is very likely to be a more accurate estimate.
The current Newspoll is an excellent and unusually extreme case in point. Its derived 2PP from primaries, and its published 2PP, are 1.2 points apart. The sample's real 2PP must be somewhere between the two.
It is for this reason that I've now switched to a hybrid approach (as added to the 2PP Methods page) in which the distribution of the primaries is allowed to influence my assumption about where in the range half a point either side of the released value, a sample's 2PP most likely lies. The maximum allowed tweak to the released 2PP value for pollsters that publish all polls in whole numbers is 0.4; arguably this is conservative and in extreme cases I should go to nearly 0.5. Aside from that the published primary and derived primary are weighted equally for now, though if anything I suspect the former should be weighted slightly higher.
While this will not make any more than 0.4 points difference to my aggregate at any time, and is unlikely to ever even make that, I believe it will lead to slightly more accurate results. I've back-calculated only the last four weeks of results since back-calculation prior to that isn't going to change the general patterns already observed.