1. Seat betting markets, considered by some to be highly predictive, returned an indifferent final result at the 2013 federal election, overpredicting the number of Labor losses by at least seven and predicting fourteen seats incorrectly.
2. Better results were achieved not only by local/state projections based on polling data but also could have been achieved by a simple reading of the national polls.
3. Seat betting markets in the final week most likely misread the election because of an overload of contradictory data. They placed too much emphasis on local-level polling and internal polling rumours and too little on national polling.
4. Prior to the final week of the campaign, however, seat betting markets performed well in projecting an uncertain situation that was difficult to model.
5. Seat betting markets were most accurate immediately following the return of Kevin Rudd. However this probably reflects on the modelling skills of bookmakers rather than punters.
6. Modellers wanting to know what seat totals betting markets expect should look at direct seat total markets rather than attempting to derive that information via complex and uncertain processes from seat betting markets.
7. Final direct seat total markets were very accurate.
This is the first of what I expect to be a two-part series reviewing the results of the federal election and what they tell us about debates about predictiveness. This article concerns the predictiveness of the seat betting markets and of the national polling picture. It is something that can be done now with reasonable accuracy with the result of nearly every seat beyond doubt and the 2PP relatively clear. The national 2PP result is currently showing at 53.44, but the Coalition skew of the "non-classic" divisions will add at least 0.2, and I'm expecting late counting to push it to about 53.7.
A future article will examine the performance of the individual pollsters in detail; not just in their final polls but also through the whole period following the brief return to the Prime Ministership of Kevin Rudd. That article will examine both national polls and seat polls, and I think it is worth waiting for the 2PP result to be finalised in each seat (apart from possibly Fairfax!) before attempting it. For the purposes of the current article, however, a tenth or two of a point of 2PP and the ultimate result in Fairfax isn't going to make any serious difference.
During the leadup to this federal election I followed the performance of the seat markets and the national polls in a series of generally weekly articles (click on the "betting" tab to see all instalments.) The opening article was here and the last instalment here.
I measured the national 2PP as recorded by an aggregate of readings from eight different pollsters using the methods of my experimental aggregate discussed here. It is a very basic aggregate, primitive compared to some, that is designed for immediate hand calculation and quick updating. These methods saw some slight changes through the campaign - mostly when various pollsters were moved in or out of the naughty corner - but they never had more than a 0.2% impact on the headline figure. When the final 2PP figure is known I will be reviewing some aspects of the aggregate model for future use, but I believe it has become more stable through experience, that it turned out to be very accurate at this election, and that I will be able to run it through the cycle to the next election with very few changes along the way.
I tracked the expectations of the seat betting markets through the simple if rather granular method of monitoring how many seats each party was favourite in, based on cross-agency data from three bookmakers (at times only two, and in one case only one, were available). This can introduce significant errors in calculating an expected total in cases where one party is narrowly favoured in many more seats than another, so I also tracked the numbers of "close" seats (defined as seats where both contenders are inside $3 on at least one market) and reported these weekly.
Differences between the number of seats Labor were favourites in, and expected totals based on implied seat probabilities as adjusted for longshot bias (such as those calculated by electionlab), seldom exceeded a couple of seats. (Of course, you get proportionally larger errors if you use artificial examples, but they're, well, artificial.) Slightly larger discrepancies between seat favourite numbers for Labor and Simon Jackman's model at some stages were explained by the latter not adjusting for longshot bias. A very large number of candidates around the country were at final odds between $10 and $25 but I do not think any of those saluted.
There was, however, a pattern for nearly the whole of the Rudd return that there were more "close seats" for Labor on betting markets than for the Coalition. This imbalance became especially pronounced in the 27 August check: at that time there were 17 seats in which Labor were narrowly favourite to only four for the Coalition. It was only significantly the other way around a few weeks after the Rudd return (with momentum apparently building in Coalition-held Queensland seats) and the night before the election (with a number of seats having just crossed from Labor to Coalition, some incorrectly). To take this into account and get a slightly less granular picture of what the markets thought, I have added one-third of a seat to Labor's tally for every close seat in the Coalition's tally at a given point. I have subtracted one-third for every close seat in Labor's own basket.
(I didn't have the time and energy to track exact implied probabilities for all seats through the campaign, but I'm not sure how reliable they are anyway. For instance, here's a typical longshot-bias case: a Coalition candidate is at $1.05 and a Labor candidate is at $8. Does the market really think the Labor candidate's chance of winning is 11.6%, or does most of the market think that chance is more like 6-7%, with just a few irrational punters driving the price for the Labor candidate down? Most likely, that candidate's real chance of winning is not as high as the converted probabilities say.)
As noted there was a small discrepancy between my final aggregated 2PP of 53.5 and a final result that may be around a quarter of a point higher. To what extent is this discrepancy because my aggregating method was prone to under-predict the Coalition's result, and to what extent is it because of late changes in voting intention that came after the final polls were released? I believe this question is unanswerable, but I've roughly split the expected difference by assuming my estimates were 0.1 points too low, just in case. This makes relatively little difference to the pattern anyway.
In terms of translating national polling at any time into ALP seats, I've read implied chances for the given swings off the pendulum and converted them to probabilities using the long-term average of standard variation from the national swing (thanks to Mark the Ballot here). I've also taken the view that any vaguely intelligent modeller not engaged in blatant Rudd-boosting would take into account the impact of first-term incumbency on the chances of dislodging various sophomore Coalition MHRs in modelling the seat total to be expected from a national poll. A downward adjustement of one seat is applied for this reason.
The "non-classic" seats Denison, Kennedy, Melbourne, Indi and Fairfax featured contests that could not be modelled using the national 2PP. (Other non-classic seats were won by candidates who would easily have won the 2PP in a classic contest as well, and therefore don't need to be considered.) In my focus on the seat betting markets, I was specifically concerned with the number of Labor seats, so I'm using that as a measure of accuracy (which eliminates the four seats in this list where Labor had little or no chance.) For Melbourne, the seat markets constantly maintained it as a Labor win, so I treat it as such for the seat market tally, while the national 2PP gave no idea whether it would stay or go, so I count it as a half.
This then is the graph showing Labor seat expectancies based on a pendulum-probability model of the national polls (red line) vs Labor seat expectancies based on betting markets (blue line). The dotted line is the actual result.
Polls Pick Result Only At The End
Originally, the national 2PP picture was "nowcasting" a very tight contest (with the red line peaking in the third week at 72.5 ALP seats, ie another hung parliament), but Labor's voting intention went downhill around the start of August and continued declining through the campaign. The final reading is 57.5, which turned out to be very slightly optimistic for Labor. Readings before the last week were much above Labor's final result.
At the end of the campaign, the national polls were very accurate overall, and indeed more predictive than they sometimes have been in the past. The final polls by landline phone pollsters Newspoll and Nielsen, landline/mobile phone pollster Galaxy, landline/mobile robodialler ReachTEL and hybrid internet/SMS/face-to-face pollster Morgan all had 2PPs that were within a point or so of the actual results. Final judgement on who was most accurate will be served later. Online-only pollsters Essential and AMR had less accurate final polls (and in Essential's case an extremely strange trend pattern through the campaign) and the less said of the final Lonergan poll in this article the better.
The national polls converged on the final result, still underestimating the damage for Labor very slightly on the whole at the end, but for most of the campaign the current state of polling was not a reliable pointer. And this was to be expected, because polling captures the state of public opinion at a given time and is not designed to be predictive well out from the day. Especially not when you have tumultuous events just before an election and a polling bounce that may (at some stage) go down. The closer you are to an election the more predictive aggregated polling becomes, but even a week out moves of two points are quite possible. The perceived contest between markets and polls in terms of predictiveness is a sham contest because markets represent a combined result of many people attempting to predict the outcome, while polls do not attempt to predict anything.
Betting Markets Doing Well, Until ...
The blue line shows the number of individual seats that betting agencies favoured Labor to win. When markets were reset following the return of Rudd, they appeared a few days later with Labor winning 56 seats (taking close-seat imbalance into account). This rose as Labor continued to poll well, peaking at 64 in the third week of the Rudd return and staying at around that level for the next few weeks.
Seat markets did not immediately respond much to the first week of bad polling for Labor, possibly because the calling of the election at the same time narrowed the range of possible results and excluded (or so it seemed at the time) some of the most disastrous options. However thereafter seat markets shed expected seat totals at the rate of two or three per week in rough parallel with the polls, and shed another four seats during the last three days to finish at about 48.8. (The number of specific seats in which Labor were favourites crashed even more markedly at the end, with 5.5 seats switching with between 4 and 11 days to go and 6.5 seats switching with between 1 and 3 days to go, but to some degree this was a sorting out of the enormous imbalance in numbers of close seats for the two sides around two weeks out from polling day.)
The seat betting markets were closer to the outcome than the current polls in terms of seat number totals at all the comparison points except one - the very last. In the end, seat betting markets were collectively expecting absolute mayhem. They were collectively expecting either a 56:44 2PP or else a massive mismatch between the national 2PP and individual seat polling. Neither happened.
Here is the casualty list of the election eve seat market failures (noting that Capricornia was a split market) :
Predicted losses that did not occur:
Kingsford Smith (NSW)
Melbourne (Vic) - Green to ALP
Predicted holds that were (all very narrowly) lost:
Indi (Vic) - Coalition to independent
Barton (NSW)- ALP to Coalition
Fairfax (Qld) Coalition to PUP
So that's a total score of 14 misses and a fence-sit out of 150.
It might sound impressive that the final seat markets still correctly predicted 90% of seats. It's not bad, but it's not exactly brilliant either, and it's worse when the errors so heavily fall in the same direction (10-1 in ALP-Coalition contests). By election day, quite without taking betting data into account, at least 75% of seats are usually no-brainers and even guessing would normally pick up half the rest. Also, just by the simple method of taking the final Newspoll and marking off seats on the national pendulum against its swing, not doing anything remotely fancy, one could have made about the same number of errors and got much closer to Labor's seat total.
The more impressive aspect of the seat betting markets' performance was their form up to the final week. Prior to that they had been too optimistic for Labor compared to the final result, but had still never been more than about nine seats out. Especially given that in the early stages of the Rudd return it was hard to tell whether Labor were going to end up with 40 seats or 85, I think that for most of this time scale the seat betting markets were good predictors of the outcome.
Another aspect of the seat betting markets' performance worth monitoring is how many seats they were right about at particular times. At all times they were wrong about Melbourne, Fairfax, Barton and Greenway, and they were also wrong about Indi at all the times measured (though they briefly had it tied on about Sep 5). But other seats switched between the correct and incorrect result, including seats where the markets got it wrong until the last week or two then correctly switched, but also seats where the markets were right until they lost their nerve at the last moment.
The lowest number of individual seat-favouritism errors, eleven, was actually when the
markets were reset on the Rudd return. The highest numbers (15 and 14.5 wrong) were in the last two checks.
It may seem that the performance of the markets in projecting the result to within about one seat, with only eleven specific seats wrong, only days after Rudd returned to the Prime Ministership, says something miraculous about the predictive power of markets. After all there was so little reliable data to go on concerning where the result might end up at that stage. But really, at that stage the markets were not that much nourished with post-Rudd money. What those results speak of is actually the skill (and/or luck) of those re-framing the markets after they were suspended following the Rudd return.
As soon as punter money started to kick in across a wide range of seats, we saw the markets displaying their classic behaviour of, as Peter Brent puts it, dawdling after the polls. Markets do not just follow polls, but take them into account and allow them to modify their expectations. The underlying expectations remain until very strong evidence wears them down, and that is why the markets never fully bought the 50-50ish poll results in July. This dawdling behaviour (likened by Brent to that of sheep) is not as dumb as it sounds, given the many cases in which striking polling results have not lasted. But at the end of the campaign, the seat markets did something very different. On the whole, the flock panicked.
I've often found that election betting markets perform
best when well nourished by polling, and worst when there is no
information or the information is misleading. Strangely, in the case of
this election, the very reverse was true. The problem for the markets was
that they had too much information, some of it contradictory, and in picking and choosing between parts
of it, chose the wrong parts.
I believe one cause of the seat-betting
stampede was over-reliance on Newspoll's local aggregated
polling, especially the aggregated Newspolls of Labor seats in Queensland and New South Wales. Seat polls in general tended to favour the Coalition compared
to the actual result, even though the time they were taken suggested
they should have favoured Labor (since most were taken one or more weeks
out, and Labor's 2PP subsequently dropped nationally.) The collective
view of seat-betting markets seemed to be that the local data were reliable and
the old-fashioned methods of national polling plus pendulum were
basically wrong. The local data had some utility (for instance, anyone
trying to model Tasmania without even considering local seat polling may well have
got both Lyons and Franklin wrong) but on the whole, local data from all
companies bar Galaxy was Coalition-skewed while the
national polls were very accurate. Persistent rumours that internal polling showed such-and-such seat to be "in trouble" or "gone" also contributed to the mess, and memories of getting burned by the late swing to the Coalition in 2010 may also have been in the mix.
Seat Betting vs Projections
Here I consider some detailed projection attempts that did not use seat betting data in any way. There were not too many of these about.
My own experimental seat projection attempt (based in part off BludgerTrack's state data, though its failures were all my own) easily outperformed the final seat betting markets, beating them in Werriwa, Blair, Melbourne, McEwen (subject to confirmation), Brand and even Barton (which they somehow missed in their mad rush to the bottom) and being beaten by the markets in only Petrie (and one of them in Capricornia.) There was a very simple case for why Barton might fall: it was within striking distance of the state swing and the sitting member was retiring. My projection model threw it to the wolves with a 49.9% Labor 2PP which I declined to override; it is currently running at 49.7.
My projection did, however, overestimate Labor's total loss numbers by three, and the Coalition's seat total by five, and a likely reason for this is that despite a degree of caution in my use of seat polling, the heavy Coalition skew of so much of the seat and local aggregate polling still came through, and not always for the better. All up it got ten individual seats (counting Indi and Fairfax) incorrect, including an "unlucky" but partly avoidable outcome on the ALP-LNP contests in Queensland (where I got the total number of losses right, but the wrong pair of seats). On a state basis, it correctly projected the number of Labor losses in all states except NSW and the NT.
For a polling-based model, some sorts of errors are probably unavoidable; there was no basic public-data evidence I'm aware of that Labor was going to hold Moreton while losing Petrie, for instance. We're not going to be able to reliably project all 150 seats correctly based on the data we have in this country any time soon, but I think there are good prospects for refining this sort of national/state/seatpoll model so that it makes, on average, maybe six or seven seat errors.
The Bludgertrack model was much closer to the mark in totals terms than mine - it got the Coalition's seat tally exactly right, with Labor over by two and Others under by two (that is, it underestimated Labor's losses to the Coalition by two, and this was cancelled out by the two crossbench wins). It was impossible for a fully objective projection model to project Fairfax as a Palmer win as there was no public seat polling. I did not see a Poll Bludger prediction list of seats to fall, but the model was correct in every state and territory regarding Labor's seat loss numbers except for an underestimate of one in each of Queensland and NSW. Had a list of expected seat losses been provided, there may well have been fewer errors in it than in mine.
The point of mentioning such models is that the 90% predictiveness of the seat betting markets is nothing to get carried away about as evidence that seat betting is all that good an indicator when there are models around getting 93% or even potentially 95% right, and any dummy should be able to on average get at least 85%. This was, however, an unusually bad election for final seat betting markets in terms of the final outcomes. They did better at previous federal elections.
And while polling-based final projections beat the seat markets in this instance, the search is still on for any kind of Nate Silver style polling-based projection model that would have outperformed betting markets at, say, seven weeks out while the Rudd bounce was raging. As a very basic idea of the sorts of considerations involved, a simple linear regression I did last August, which uses only two terms (the worst polling position of the party in power during its term, and the party in power) has a long-term average error of nine seats, and at this election was out by six (it predicted 61 ALP seats). During the Gillard era while betting markets were expecting a sub-50-seat result, this facetiously simple projection model outperformed them, and it didn't do that much worse during most of the campaign either. I think the trick for the future may be to use a simple historic model of that sort as a baseline projection, and then to increasingly weight in polling evidence as it becomes more predictive.
Does That Mean Betting Markets Aren't That Predictive Of Seat Totals After All?
No, I don't think it does. I just think that those who look to seat-betting markets and try to extract the most useful predictions of seat totals are looking in the wrong part of the overall betting market. It baffles me that experienced statisticians attempt to determine how many seats betting markets think parties will win by looking at an indirect and problematic measure (aggregation of implied probabilities concerning particular seats) when there are more direct markets available on seat total events and their past track record has been excellent.
The exact seat total market (as an average of its most favoured points - I never got around to calculating an implied probability, but the distribution looked rather symmetrical) expected 92.5 Coalition seats. It was out by 2.5 with the error being caused mainly by a couple of non-classic contenders getting across the line by extremely slim margins. From the time this market started (which was during the campaign) I don't think it ever had the Coalition total out by more than five.
The final implied average of the Correct Election Result market was 54 Labor seats. This was out by one. The implied average did tend to run a bit above the seat betting markets, possibly as a product of longshot bias, but when the seat betting markets melted down at the end, it barely moved.
Perhaps the comparison is a little kind in both these cases as both markets were taken down a little bit earlier than the seat markets, but not by much. This is not the first election at which these total-seat forecasting models have returned a very impressive performance that is very challenging to beat using polling-based methods.
electionlab in their final analysis considered that the most likely culprit in the errors made by seat betting markets was their modelling and not the markets themselves. In my view, there was nothing significantly wrong in their model's read of what the seat betting markets were thinking - rather, what actually happened was that seat betting markets themselves were in fact wrong. Different modelling assumptions regarding covariance and so on greatly affect the spread of modelled expectations, but they have little impact on the mean. The seat betting markets were collectively expecting Labor to win fewer than 50 seats at the end. There is no way to remodel the final odds to find 55 seats for Labor in them because it is just not true that those markets thought Labor would win that many seats. Or at least, if someone "finds" such a way to read that result into the markets, the next time they test it I can pretty much guarantee the post hoc overfitting in their new model will cause it to blow up.
That concludes my analysis of the performance of betting markets relative to polls at this election and I hope it's been of interest. Coming in the next few weeks: pollster ratings!
(NB: Pollster ratings deferred until November because national 2PP has not been finalised and sealed by AEC.)
Update: electionlab responds:
Kaighin McColl from electionlab has responded to my comments (mainly the last paragraph) over here. The response brings up a number of issues that I didn't raise in the main part of this article and that I hope will be of interest to those interested in the predictiveness (or not) of betting markets. Aside from that, this PS is very long and rather wonky, probably around WF3 (wonk factor, on a scale that normally runs from 1 to 5) climbing to 4 in steeper sections.
The first issue raised is the issue of the spread of distributions. We agree that seat probabilities are not completely independent of each other, either in reality or in the view of punters. However, market probabilities as expected by markets aren't "maximally covariant" either - a polling event or policy announcement might cause a change in fortunes in a state or a given region of a state (such as the infamous "Western Sydney") without affecting the national picture, and there are always all those little random campaign events, strategic decisions and gaffes that might unexpectedly affect perceived chances in any one seat.
So this raises the question: if the assessment of the markets has a mean of, say, 48 Labor seats, what sort of +/- should we consider that to come with? Kaighin points out that "According to the [electionlab] maximum covariance model, there was a 95% chance Labor would obtain somewhere between 32 and 64 seats."
But I don't think we could just give a model a tick if it gave a 95% confidence range that runs from 32 to 64 seats on election eve and the result happened to land somewhere in that range. It would be like saying "the markets think the Coalition's going to win with a 2PP of on average 55.5% and with a 95% chance of it being somewhere between 52% and 59%" and counting anything in that range as within the model's MOE and therefore consistent with the model. It would indeed be consistent with the model, but the correct response would be that the model was too vague to be useful.
I agree that a moderate covariance model based on the seat betting markets would give a tighter range with a similar mean that would still have 55 Labor seats within its MOE, and that would not immediately invite the same reaction. However, as the MOE of such a model dropped, so too would the implied probability of a Labor seat haul of 55 or greater for a mean of 48.
How much it would drop does depend on the shape of the expected distribution - when a model says that the markets expect (say) 48 Labor seats, does this mean the markets are expecting a range of outcomes fairly close to 48 seats, with 48 the most likely value, or does this mean the markets are expecting that the actual value will average 48 but probably won't be close to 48 seats, and might be more likely to be either, say, 42 or 54 seats? Quoting Kaighin again:
" Assuming independence between seats results in a unimodal, bell-shaped distribution; on the other hand, assuming maximum covariance between
seats (as constrained by the betting markets) results in a bimodal
distribution, with very little density in the middle. More generally, it
makes intuitive sense that the distribution might become multi-modal if you bring in covariance between seats"
In my view the reason electionlab ended up with the rather striking batwing-style distribution that they did (see graph here) is not specifically that they used a maximum covariance model, but that they used an extremely chunky cutoff in correcting for longshot bias, by setting the chances for candidates with an implied <10% probability to zero (hence making many seats no-contests) and not adjusting those with implied probabilities just above the mark at all. But in reality the relationship between odds and longshot bias is probably a smooth and increasing one - as the odds for an underdog reduce, the influence of longshot bias on those odds (as a proportion of the implied probability) rises.
No candidate with an implied chance of 0.05 to 0.1 (or less for that matter) won at this election, but they now and then do, maybe 1-2% of the time (eg Andrew Wilkie in 2010). Candidates with implied chances of 0.1 to 0.2, not adjusted for longshot bias, underperform their odds as well, winning <10% of the time. I think candidates with implied chances of 0.2 to 0.3 would historically be found to have underperformed slightly but significantly too.
I can't see any reason why a distribution including a realistic measure of covariance and a gradual adjustement for longshot bias should be multimodal (more than one peak on the probability curve), at least not at the end of the campaign. For it to be so, it would likely be the case that events that shifted probabilities in a number of seats together would be more likely to have a medium or a large impact than a small one. I can't see any reason why this should be the case at that stage. More likely, I'd expect the distribution to be unimodal, with a peak reasonably close to the mean, but rather than a standard bell curve, I'd expect something with a flatter top. That's based on just thinking about a simple model in which all seats are independent, and modifying it with the possibility of national/state/local level adjustments that might affect assessments in all/many/some seats together.
Indeed, when I've watched the less granular seat total betting markets (like the exact seat total market in this case) they always seem to end up with a one-humped distribution with a relatively flat top - this case with 91, 92, 93 and 94 Coalition seats all equal favourites at about $12 each was an example of this.
A tricky question here also is what should be the rate of covariance at the end of the campaign. Well out from the election, markets will expect covariance because of the possibility of national or state-level changes in vote share affecting huge numbers of seats at once. By election eve, there is a possibility of late swing, but it is muted compared to the uncertainty in picking each party's 2PP months out. There is also the possibility that the national polls are all wrong and collectively lean in the same direction, but there's usually not that much in that. It's also true that events that drive a degree of seat independence (like campaign incidents in given seats) are much less likely to be not already factored in by election eve. But there is always some level of uncertainty about what any given electorate is going to do on the day, and I suspect the influence of that grows as the chance of a large-scale national shift reduces. So perhaps seats are less covariant on election eve than, say, a few weeks out.
Also in trying to model what a betting market thought, we need to not only consider what it should have thought about covariance on election eve, but also what it did think, and this is a very difficult question. It is especially so when what we are modelling is the aggregated actions of punters, some of whom will be modelling off the national picture and some of whom might be largely ignoring it.