Friday, February 15, 2013

Federal 2PP estimate feature added

 (Note: This article documents the experimental phase of the aggregate, which ran from February to September 2013.  For transitional arrangements post the election see here.)

I've just added a subjective two-party-preferred vote (2PP) estimate feature on the sidebar.  The reason I have added this feature is that I commonly get involved in discussions on various sites in which someone is making claims about the likely state of the national 2PP that aren't even remotely credible, or just getting confused about all the different polls and the strange range of values they spit out.  I think a fair few people will from time to time be interested in my view of where things are at, especially when having those kinds of debates while I'm not around.  There are some handy formal aggregators about, but they often take a few days to update and I often find my view a little out from theirs (and usually somewhere in the middle of them all), typically because everyone has slightly different views on the best underlying assumptions.  When I can, I'll be aiming to update this estimate quickly.

The 2PP estimate is not a fully formalised aggregator and is not a scientific test. I'm not at the point of being ready to attempt something like that yet, and while I have a long-running Newspoll rolling-average of sorts for historical comparison purposes, I've only recently developed an interest in trying to gauge the picture across all federal pollsters.  It's just my hopefully informed opinion on the basis of an informal and at this stage loosely defined aggregation process.  The rough assumptions I make in thinking about the national 2PP are as follows.  (Note: Article has been edited to give the current version.  Legacy text appears at the bottom so people can see how this model has developed.)

* Only polls that are either commissioned by media sources, or released by polling companies, are used.  Any poll that is commissioned by a party, an activist or lobby group or a company is an "internal poll" and is completely disregarded.

* For the time being, only national polls are used and seat or marginals polls (for instance) are not taken into account irrespective of funding.  Close to the election, if large polls of particular states or huge number of seats can be effectively converted to a share of the national 2PP, I'll look at those.

* I assume that the house effects of all the pollsters I observe do not sum to zero.  However, I don't accept that the results of past elections are necessarily a perfect benchmark for assessing the actual sum of all house effects, since I suspect that the Coalition recently gets more than its fair share of on-the-day late swing.

* If a pollster published respondent-allocated preferences, I ignore those claimed preferences, and use preferences by the last election result instead.

* A global "house effect" is assessed against the overall average to account for the leanings of different pollsters.  At present, with Morgan Multi Mode, Newspoll, Nielsen, Morgan Phone, Galaxy, ReachTEL, AMR and Essential the pollsters included, the current global house effect correction is to increase the Labor 2PP by 0.15 points (formerly 0.3, see Rudd section below for reason for change).

* Lonergan are currently not included, pending further evidence, as they employ a method of scaling by self-reporting of the 2010 election results in their seat polling, but it is not known if they are employing it federally.  This method is potentially unsatisfactory because some respondents will report how they voted at a previous election incorrectly. 

* A specific "house effect" is added on top of the global house effect if a given pollster appears to be producing results that are different to others and is either polling irregularly or did not poll at the last election.  At present, the specific house effects applied are:  

Morgan Multi-Mode and AMR 1 point added to Coalition 2PP

* Polls are weighted in averaging as follows: this week's polls 5, last week's polls 3, week before last 2, week before that 1. A new week starts at midnight each Friday.  Polls released late in the week or spanning two weeks of data will carry weightings averaged between the weightings of the two weeks.  In the last three weeks of the campaign, polls based entirely on data less than four days old will carry a weighting of 8.  In the last week, polls based entirely on data less than two days old will carry a weighting of 10.  The last two weightings don't apply to pollsters who have issued less than five national 2PP readings this year.  Only the most recent poll by any polling source will receive either boosted weighting.

* At present, sample size is usually disregarded since the pollsters with the largest sample sizes are those I am coincidentally most cautious about.  Any poll with a sample size below 950 is weighted at half value.

* No more than two polls by any one data source are included at the same time.  (Morgan multi-mode and Morgan phone are treated as different data sources.) Only the most recent Essential and MMM polls are included on the grounds of reservations about Essential (and their data being older at any time) and the newness and lack of sufficiently clear methods details of MMM. During the last three weeks of the campaign a new Essential is weighted at 4 not 5, as the size of other polls tends to increase and data that are well over a week old become arguably less relevant.

and last, and most contentiously :

* I assume that what I am perhaps imprecisely calling "house effects" can vary in the medium term as a result of (i) unpublished methods changes or parameter updates (ii) sampling or other method problems (iii) differences in method causing the results of different pollsters to sometimes respond differently to different issues.  While I'll try to be cautious about temporary small movements that are very likely to be just random drift, when they really start to look like a pollster's behaviour has seriously altered, I will adjust the global house effect or impose a specific house effect until that pollster's readings stop behaving like that.

(This is, of course, all quite disgracefully unfalsifiable and naughty!  If a pollster continues the observed lean I was right, and if they return to the fold it may well be because they changed their herbs and spices in response to increased public awareness that their readings were out of whack.)

Rudd Replaces Gillard Update (26 June):   This is such a seismic event that I've decided to discard all Gillard-era data from the aggregate and start afresh.  Normal weightings will apply to the new data; I've decided that the Morgan snap SMS poll is an acceptable starting point as it is very similar to the Rudd-hypothetical poll results of previous weeks.  The aggregate will carry a Limited Data warning until there are five polls in the mix.

This also results in the cutting of the global house effect correction from 0.3 points to 0.15.  0.15 points was the value I obtained from past election results but I doubled it because I believed that preference distributions of the Others category were not properly reflecting their nature, since there were a few "Rudd refugees" hiding there. 

Other modellers may well keep Gillard-era data, which will make their models initially less accurate, but may result in greater accuracy should the Rudd bounce wash out quickly.

Morgan Slugged (22 July): After Morgan Multi-Mode recorded its fourth consecutive Labor lead while no other major pollster has had Labor in front, I've decided enough is enough and slugged MMM a point on the assumption that it may well have developed a house effect with Rudd as PM.  There is a strong case for slugging them two points and I may increase it to 1.5 or 2 should the current pattern continue. I expect to slug Essential as well tomorrow depending on its reading.

Essential Slugged (29 July): I held off on slugging Essential its intended point after its move last week but now that it has returned exclusively results with the Coalition ahead (while Nielsen, Newspoll and Galaxy have all returned at least one 50-50 and Morgan has exclusively had Labor ahead) it is now pinged a point.


Original Article House Effects Discussion

I say "contentiously" above because while recently docking my routine point from Essential's 55 to Coalition (note: no longer done directly because of change to global house effect), I was surprised to have a gumnut thrown at me by a marsupial!

He actually didn't agree about the 1-point lean to Coalition, on the grounds that it is not a correct figure going back to 2010 (at some time during 2010 Essential changed its methods with dramatic effects, as noted in Essential: Not Bouncy Enough?  )

However, looking at the difference between Newspoll and Essential (which has been exacerbated by Newspoll's recent run of higher than average bounciness) I really don't believe that what we've seen in the last eight months is just, as he had it, "transient wanderings between polls over some arbitrary short term!".  Here's a graph showing Newspoll and Essential differences, for "matched" polls released in the same week (though in each case the Newspoll's polling period is shorter), since the start of 2010:

The graph shows the Essential 2PP reading minus the Newspoll reading - blue means Essential had a higher Coalition reading, red a higher Labor reading. Where there is a gap, the readings for that week were identical.

I don't know exactly when in 2010 Essential shifted from being a very Labor-leaning and very bouncy poster to a more or less neutral (for a while) and remarkably constant one, but I'm not taking any notice of the first five values on the left in drawing conclusions about what's going on on the right.  There's a period from late March 2010 (result 6) to April 2012 (result 50) in which Essential and Newspoll had very similar readings, with Essential about a third of a point more Labor-friendly.  From May 2012, however, the red disappears and the blue takes over, and the average difference has been 1.53 points (1.75 before last week's Newspoll, which was an outlying value compared to last week's other polls.)

I've done some simulations that give the chance of a run this extreme happening by chance in a sample as small as a few years at around the 5% mark, and will post gory details of my methods and results for those, such as they were, in the comments section on request. That plus the evidence that Essential has a strange lack of dynamism even for its sample size (which no-one has yet explained) makes me suspect that this is some kind of genuine difference caused by one or more of the factors (i), (ii) or (iii) mentioned above, and hence I intend to keep applying the stated correction until the pattern has clearly stopped.  (In accordance with my long experience that the best way to ruin a beautiful pattern is to observe it and write an article about it, this will probably happen immediately.)


Legacy Text From Original Model (No Longer Current):

* I currently adjust the following polls: Morgan Face-to-Face by last election preferences (add 2.5 to Coalition), Essential (deduct 1 from Coalition), Galaxy (deduct 1 from Coalition.)  I may alter these weightings or add weightings to others depending on evidence.  Weighting changes will be edited to the bottom of this article.  (See updates below.)

* When there have been a lot of polls in the last fortnight, I'll take a little bit of notice of readings before then, but not that much.  I'll be more cautious about assuming that the polls from the last fortnight tell something close to the full story when there are fewer of them.

Legacy Text From Updates:

Morgan Multi-Mode update (12 March):  Morgan has released a new form of polling called multi-mode which includes both face-to-face and internet sampling.  The breakdown of the two sample types has not been published.  As Morgan face-to-face is known to have a strong house effect, it's tempting to assume that MMM will have a house effect too, if face-to-face is a major component.  Perhaps F2F is not a major component and its inclusion in MMM is simply for the purposes of saving face-to-face.   ;)

Anyway there have been two polls of this new form with one a bit over 1 point below my aggregate for that time and one a bit over 1 point above.  On that basis I am going to assume MMM has no house effect until evidence emerges to the contrary.  (It is only because of the huge sample sizes that I am even willing to provisionally assume that.)  However, because there is so little evidence, I am only going to weight MMM at the same weighting as any other single poll, and only use the most recent MMM, despite MMM's massive sample size.

As, over time, I develop more confidence about whether or not MMM has a house effect and if so what it is, it is probable I will increase its weighting relative to the other polls.  However, that will also depend on assessments of its other properties, including bounciness levels for its sample size.

Also, now that MMM exists, I have discarded the one remaining MF2F that was in the current aggregate.

House Effect Removed from Galaxy (25 March): For the time being I've removed the 1-point house effect from Galaxy.  The reason is that my perception has been that Galaxy's habit of leaning to the Coalition by about a point compared to Newspoll and Morgan phone goes away at election time.  This is backed by the Bludgertrack weightings which show that Galaxy is almost identical to Newspoll in terms of the average predictive errors of its final polls at elections.  Therefore if one habitually diverges from the others between elections the question is on what basis we are to assume some are right and some wrong.  This does not mean other less established pollsters that show a similar amount of divergence shouldn't be treated with caution, especially where it is based on apparent recent changes in the behaviour of that pollster.

A lot of the heat has gone out of the house effect issue with the demise of Morgan Face 2 Face, meaning that whether some polls are adjusted for or not will now probably make not more than half a point of difference, but there is still some caution needed because the remainder contain a mix of polls that appear to be slightly Coalition-leaning and polls that appear to be more or less neutral.


  1. Drawing out the state results is somewhat trickier now that EMRS has stopped issuing the seat by seat outcomes. There always was a rider that the results were based on an in sufficient sample to be determinate anyway, which was reflected in the variability of the data.

    I wouldn't have thought the seat sample would have had to be that much bigger than the ~200 they take to establish a valid measure.

  2. When they used to release seat figures I used to merge seat figures from two consecutive polls if the overall result of the two polls was similar. That gave an effective sample size of about 400 per seat, which was vaguely useful. Hare-Clark projections for individual seats are much more sensitive to sample errors than projections for single federal seats, because multiple members are elected from each seat (often creating close margins) and because of some of the quirks of Hare-Clark.

    I've looked at what seat data there was last year (in one case from a Liberal-commissioned poll, but I think it's OK to use it for that purpose) to try to get a handle on the extent to which the swing by seat might be non-uniform; until we get more seat data that's the best I think I can do. Results of this exercise are here:

    I'm expecting that we will see more seat figures (from whatever source) as the election approaches.