Site icon R-bloggers

5 Sigma in CRU

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

UPDATE: Ron Broberg has a more definitive explanation of the difference which indicates that 5sig issue is not the main cause of the difference. See his exposition here

A short update. I’m in the process of integration the Land Analysis and the SST analysis into one application. The principle task in front of me is integrating some new capability in the ‘raster’ package.  As that effort proceeds I continue to check against prior work and against the accepted ‘standards’. So, I reran the Land analysis and benchmarked against CRU. Using the same database, the same anomaly period, and the same CAM criteria. That produced the following

My approach shows a lot more noise. Something not seen in the SST analysis which matched nicely. Wondering if CRU had done anything else I reread the paper.

” Each grid-box value is the mean of all available station anomaly values, except that station outliers in excess of five standard deviations are omitted.”

I dont do that!  Curious, I looked at the monthly data:

The Month were CRU and I differ THE MOST is  Feb, 1936.

lets look at the whole year of 1936

First CRU

had1936
[1] -0.708 -0.303 -0.330 -0.168 -0.082  0.292  0.068 -0.095  0.009  0.032  0.128 -0.296
> anom1936
[1] “-0.328″ “-2.575″ “0.136″  ”-0.55″  ”0.612″  ”0.306″  ”1.088″  ”0.74″   “0.291″  ”-0.252″ “0.091″  ”0.667″
So feb 1936 sticks out as a big issue.
Turning to the anomaly data for 1936. here is what we see in UNWEIGHTED Anomalies for the entire year
summary(lg)
Min.     1st Qu.      Median        Mean     3rd Qu.        Max.        NA’s
-21.04000    -1.04100     0.22900     0.07023     1.57200    13.75000 31386.00000
The issue when you look at the detailed data is for example some record cold in the US. 5 sigma type weather.
Looking through the data you will find that in the US you have feb anomalies beyond the 5 sig mark with some regularity. And if you check google, of course it was a bitter winter. Just an example below. Much more digging is required here and other places where the method of tossing out 5 sigma events appears to cause differences(in apparently both directions). So, no conclusions yet, just a curious place to look. More later as time permits. If you’re interested double check these results.

To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.