I have now written two blogs on homogenisation of climate data (this one and this one) and really want to get on with blogging about other things – mainly matters less fraught. So let’s finish this off and move on.
I realise that both my previous articles were embarrassingly oversimplified. Matt Menne sent me his paper detailing how he and his colleague Claude Williams homogenised the climate data. On reading the paper I experienced several moments of understanding, several areas of puzzlement, and a familiar feeling which approximates humiliation. Yes, humiliation. Whenever I encounter a topic I feel I should have understood but haven’t I find myself feeling terrible about my own ignorance.
You can read the paper for yourself, but I thought I would try to precis the paper because it is not simple. It goes like this:
- The aim of the paper is to develop an automatic method (an algorithm) that can consider every climate station temperature record in turn and extract an overall ‘climate’ trend reflected in all series.
- The first step is to average the daily maximum and minimum values to give averaged monthly minimum and maximum values and monthly averages. This averaging reduces the ‘noise’ on the data by a factor of approximately 5 (the square root of 30 measurements) for the maximum and minimum data and 7.5 for the average (the square root of 60 measurements).
- Next we compare each station with a network of ‘nearby’ stations by calculating the difference between the target station data and each of its neighbours. In the paper, example data (Figure 1) is given that shows that these difference series are much less ‘noisy’ than the individual series themselves. This is because the difference series are correlated: for example, when the monthly average temperature in Teddington is high, then the monthly average temperature at nearby stations such as Hounslow is also likely to be high. Because the temperatures tend to go up and down together – the differences between them show much less than the variability of either series by itself.
- The low ‘noise’ levels on the difference series are critically important. This allows the authors to sensitively spot when ‘something happens’ – a sudden change in one station or the other (or both). Of course at this point in the analysis they don’t know which data set (e.g. Teddington or Hounslow) contains the sudden change. Typically these changes are caused by a change of sensor, or location of a climate station, and over many decades these are actually fairly common occurrences. If they were simply left in the data sets which were averaged to estimate climate changes, then they would be an obvious source of error.
- The authors use a statistical test to detect ‘change points’ in the various difference series, and once all the change points have been identified they seek to identify the series in which the change has occurred. They do this by looking at difference series with multiple neighbours (Teddington – Richmond, Teddington – Feltham, Teddington – Kingston etc) they identify the ‘culprit’ series which has shifted. So consider the Teddington – Hounslow difference series. If Teddington is the ‘culprit’ then all the difference series which have Teddington as a partner will show the shift. However if, say, Hounslow has the shift, then we would not expect to see to a shift at that time in the Teddington – Richmond difference series.
- They then analyse the ‘culprit’ series to determine the type of shift that has taken place. They have 4 general categories or shift: a step-change; a drift; a step-change imposed on a drift, or a step-change followed by a drift.
- They then adjust the ‘culprit’ series to estimate what it ‘would have shown’ if the shift had not taken place.
So I hope you can see that this is not simple and that is why most of the paper is spent trying to check how well the algorithms they have devised for:
- spotting change points,
- identifying ‘culprit’ series,
- categorising the type of change point
- and then adjusting the ‘culprit’ series.
are working. Their methods are not perfect. But what I like about this paper is that they are very open about the shortcomings of their technique – it can be fooled for instance if change points in different series at almost the same time. However the tests they have run show that it is capable of extracting trends with a fair degree of accuracy.
It is a sad fact – almost an inconvenient truth – that most climate data is very reproducible, but often has large uncertainty of measurement. The homogenisation approach to extracting trends from this data is a positive response to this fact. Some people take exception to the very concept of ‘homogenising’ climate data. And it is indeed a process in which subtle biases could occur. But having spoken with these authors, and having read this paper, I am sure that the authors would be mortified if there was an unidentified major error in their work. They have not made the analysis choices they have because it ’causes’ a temperature rise which is in line with their political or personal desires. They have done the best they can to be fair and honest – and it does seem as though the climate trend they uncover just happens to a warming trend in most – but not all – regions of the world.
You can read more about their work here.