Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Update:
Some are not aware that GISS has switched to using Nightlights in the ROW. According to their updates they have moved to nightlights for the ROW.
The station inventories can be found here
The station I examine below is listed like this in the new giss inventory
20551495001 TURPAN 42.92 91.00 24 384R -9MVDEno-9x-9HOT DESERT A 0
That final Zero is Giss’ field for nighlights. For this station our figures match. My preliminary analysis shows no station at that location, however there is a nearby settlement that may be its location.
As most who follow the construction of a Global temperature index know the Index created by GISS uses Nightlights as a determinate of whether a station is urban or rural. To determine if a site is rural GISS apparently looks at the Night lights value and if the value is less than 10, the site is declared as rural. There is an inherent problem with this that results from the location errors in the GHCN inventory. Simply, the Nightlights image is collected at a 30 arcsecond accuracy while the GHCN inventory is far less accurate. A GHCN site that is supposed to be at 42.0 90, may actually be at 42.3,90. The exact distribution of errors has not been quantified, but others looking at the issue have reported errors up to .5 degrees. At the equator nightlights data is roughly 1km data. .5degrees at the equator is roughly 55km. That means when GISS looks at nighlights they could actually be looking at the lights from a position 55km away from the actual site. They have not taken aliasing into account to INSURE that a station is in fact rural.
In an effort to come to some understanding of the magnitude of this problem I’ve built several tools. The first was a program to download and analyze Nightlights. Using the calibrated dataset referred to by GISS, I downloaded and worked with the HiRes version. My approach was simple and brute force. For every station in the GHCN inventory I cropped the surrounding area to approximately a 111km square with the GHCN station in the center. It turns out after I wrote the code that raster has a command to do exactly what I wanted [ see xyValues(..buffer=)
The resulting sub raster was then processed to extract the following: min value, max value, mean value, and value at the location indicated by GHCN. These rasters can then be plotted. In the abstract we are concerned about urban sites being mislocated as rural sites. That is, a site in a city having a recorded location that is outside the city. Such a site would show up dark, but actually be in a lit area. The goal then would be to develop an approach that would allow for the rapid assessment of this possibility. More on that later.
First, we will start with an approach that eliminates that possibility. We do that by constraining the stations to those that have no lights brighter than 10 in the whole sub grid. Starting with 7280 stations if we select those stations that have a Nighlights value of less than 10, and if we require that no pixel within ~55Km of that location is also less than 10, we have a good assurance that the site is dark regardless of the location errors, provided the location error is less than ~.5degrees. Filtered thusly there are 1099 such stations in GHCN. My Data is all open and viewable in google fusion tables. Fusion tables will allow you to subset the data any way you want and geo view the results.
As you can note that means the other 6000 sites are all in areas where there is some sign of urban development within ~55kmkm or so of the location. To screen for the phenomena I’m asserting exists, I filtered the data to look for sites where the Area had “high frequency” that is, values of 0 and values over 100. The reporting site has a value of Zero according to its location in the inventory AND the surrounding area has values over 100. There are 179 entries meeting these condition. That view is located here
<iframe width=”500px” height=”300px” scrolling=”no” src=”http://tables.googlelabs.com/embedviz?viz=MAP&q=select+col0%3E%3E0%2Ccol1%3E%3E0%2Ccol2%3E%3E0%2Ccol3%3E%3E0%2Ccol4%3E%3E0%2Ccol5%3E%3E0%2Ccol6%3E%3E0%2Ccol7%3E%3E0%2Ccol8%3E%3E0%2Ccol9%3E%3E0%2Ccol10%3E%3E0%2Ccol11%3E%3E0%2Ccol12%3E%3E0%2Ccol13%3E%3E0%2Ccol14%3E%3E0%2Ccol15%3E%3E0%2Ccol1%3E%3E1%2Ccol2%3E%3E1%2Ccol3%3E%3E1%2Ccol4%3E%3E1%2Ccol5%3E%3E1%2Ccol6%3E%3E1%2Ccol7%3E%3E1+from+272905+where+col2%3E%3E1+%3D+’0′+and+col4%3E%3E1+%3E%3D+’100′&h=false&lat=-1.0546279422758869&lng=177.1875&z=2&t=2&l=col1%3E%3E0″></iframe>
Looking at that map I was drawn to the site in China. For a couple reasons that should be obvious. Pulling up the inventory on that station we see the following: Please note the fields my analysis adds. ( again with the mountain valley sites!)
Lat: 42.92
Lon: 91
Altitude: 2.4
Name: TURPAN
GridEl: 384
Rural: R
Population: NA
Topography: MV
Vegetation: DE
Coastal: no
DistanceToCoast: NA
Airport: FALSE
DistanceToTown: NA
NDVI: HOT DESERT
Light_Code: A
CountryName: CHINA
DSMP: 0
MinLights: 0
MaxLights: 160
MeanLights: 0.908130081
AreaLights: 12392.31018
SumLights: 17872
“AreaLights” is the area in square km that I perform the cropping at, roughly 111km per side. The putative station location is centered and I crop a equal area square around every station center. The figure of 111km is selected based on reports of errors as large as .5 degrees in some station locations. SumLights is the sum of all radiance values in the sub grid. DSMP is the figure at the GHCN location which matches the inventory value in GISS.
Using the lat lon from GHCN we an get the google map. with the green arrow indicating where GHCN ( And GISS) thinks the station is:
You can zoom in on the location at the green arrow. See any station?
Here is what this world look like to nightlights. the blue circle represents the location given by GHCN. its dark. Yet, a few km away we find an urban location. At approximately 43.08,90.45 we see the hot spot
And here is the GE view of the hotspot
So what do we know. We know that Nightlights is reported at a 1Km resolution. When know that GHCN has a coarser resolution. This leads to aliasing. When we read Nightlights given the locations in GHCN we have no assurance that the lights figure we obtain is the correct one. If we try to protect against this by filtering stations more aggressively, we are left with 1099 stations. That is 1099 stations that have no light in their area. On the other hand, there are 179 sites where a bright urban source is close by. By chance I picked good site to illustrate the issue, or maybe that station is somewhere in that pathless desert. My bet is the location and the assesement of this site as “dark” is wrong. Since this is classed as an “rural” station it recieves no adjustment but would adjust other stations in its sphere of influence: here is the Giss chart of the station:
The solution is for NOAA to update its inventory with accurate location data. This is tedious work, there is a much better way to fix these simple problems
If Anybody wants a good are to study I’ll suggest this collection of sites where the max lights are less than 10 for a nice tight collection of stations. But here too be aware of the land use change.
And for a list of US sites, start with this map. MaxLights in the area less than 10. consider the constellation of three sites in the same vincinity. In particular “hanksville” which Giss sees as Dark, and which my program determined has no lights within a 55km radius. In the chinese case we have a station mislocation, and below we see than even the requirement of no lights within 55km is not a perfect filter
42572470001 HANKSVILLE 38.37 -110.72 1313 1358R -9HIDEno-9A-9HIGHLAND SHRUB A1 0
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.