Predicting constituency results - a tough problem

Paddy
May 31, 2017
4 min read

So, the Times has published a prediction that the Conservatives will LOSE seats on June 9th, creating a hung parliament. Obviously, there has been a lot of movement to Labour recently, but this is an extremely bold claim (especially when other pollsters are predicting a majority of 100+ seats). So what should you believe?

*cracks knuckles*

First, a few things to be clear on: the prediction is that the Conservatives will win 310 +/- 35 seats, meaning even this extraordinary prediction gives them a 1 in 40 chance of getting more than 345 seats (and a similar chance of Labour winning more seats than them). There's also the problem that since YouGov don't account for the fact that there's some chance of a systematic error in their polling occuring which is not zero, their margin of error must be way too small.

The Conservatives have 331 seats at the moment. 326 are needed for a majority, except not really: 4 Sinn Fein MPs won't sit and 8 DUP MPs could prop up the government in a pinch. The government therefore has a bit more breathing room than it might appear.

So where does the prediction come from? Well, it's a model built by YouGov which aims to predict the results in each constituency based on demographics and polling of 50,000 people. Turning poll share into a prediction about seats is notoriously difficult. The historical 'least-worst' way to do it is to take the results in each constituency last time and use a 'uniform swing' approach. I'll explain how that works. If the national share last time was 37% Con, and they are then predicted 43%, you add 6% to the Conservative share in each constituency. You then repeat with Labour etc. and get a result for each constituency: job done. For example, if you stick the numbers into ukpollingreport.co.uk/swingometer-map from that YouGov poll with the 5% lead (the next one showed 7%, by the way), you get either 323 Tory seats or 329 depending on whether you round up or down... which highlights how ludicrously finely balanced such a model ends up being.

Of course, the premise of uniform swing is a bit silly - the only defence of it is that it works okay, on average (it can be completely wrong in tonnes of marginal seats, but no-one cares as long as the number of seats which change is about right, or even as long as it calls the result the right way). Or at least, it does if you deal with Scotland and Northern Ireland separately: clearly in Belfast South the Conservative share of 1.5% can't go below zero and isn't going to jump up to double digits if the Conservatives have a bad/good night on June 9th. But we can see right away that fundamentally - putting the mechanics of their fancy new model aside - the issue here is YouGov reporting much higher Labour vote share than some other respected pollsters.

Let's explore that point - what it comes down to is the turnout model. YouGov ask people how certain they are to vote and in their normal polling include only those that say they are certain. Others assume turnout for each demographic will be pretty much the same as in 2015. Currently 82% of 18-24 year olds say they will vote for sure. Two years ago it was 44%. So there's your difference right there: that's why one pollster will tell you the Conservatives have a 14% lead and another will say it's 5%.

It also confirms my suspicion I mentioned previously that some of the recent big change in the Labour vote share is coming from people changing their mind about whether to vote or not (obviously, this group of maybe-non-voters contains a disproportionate number of young people) or -- even worse! -- about whether to spend time answering an opinion pollster's questions or not. This is why my take on this is that when the polls are volatile (i.e. changing quickly) you shouldn't trust them. The changes being picked up are not of the same type as the long-term gradual changes that make or break elections (there are even political scientists who claim that the polls 6 months from an election provide the best estimate of the result, not the ones the day before!) and you can't trust them not to disappear again either in the next week, or on the day in the polling booths.

Oh, as for YouGov's fancy new model, they haven't published full details yet, but basically the idea is that we have good estimates of the demographics in each constituency, and they have produced polling of 50,000 people spanning all demographics. The assumption is that e.g. less-well-off, old, white men in one constituency will vote in the same way as those in another in the neighbouring county. I have a healthy degree of skepticism about this; I think it ignores the fact that lots of people cast their vote based on their degree of belief in the chance of winning (which is why "Labour can't win here! Vote Lib Dem!" is such an effective tactic).

A couple of months ago I tried something very basic along these lines by looking at YouGov polling split by Leavers and Remainers (both big subsamples, so the uncertainty isn't too bad) and the Leave/Remain share in each constituency, and then just assumed all the Leavers would vote the way they do nationally, and all the Remainers would do the same. At the time this approach predicted a Tory landslide of frankly ludicrous proportions, because all the Labour heartlands where Leave won big all went blue. In real life that wouldn't happen for a host of reasons.

So, bottom line - be wary of big claims based on current polling. If Labour want to change the outcome of this election from the one predicted back in April then they *need* an unprecedented turnout from young people. There's some evidence they could get it, but don't believe it until you see it.

Mapping election results

EU attitudes - a change in political weather?

Farewell to Cassini

Predicting constituency results - a tough problem

Comments