Forking Myself
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve spent some time forking myself. Over the past few days when I could steal away an hour here or there I decided to make a big change to the package. But it’s a good change. First some book-keeping. The Romantest.R file has a minor bug in it. Not really a bug, I just pulled out the wrong statistic from the brick ” cellStats(..,sum), should be cellStats(…mean) unless you want to errr “add” all the cells together. Too eager to post.
On with the forking. Nick Stokes and I exchanged a few emails looking at various ways of speeding up readV3Data(). During the course of that we found a small “bug” to report to NCDC. As I thought more about his data structure, a 3D array of stations, months, and years it became obvious that the whole package could be easily rewritten to support it. And I could bury forever the old 14 column v2 format. The way I’ve implemented that is an option on the readV3data() function. You specify which kind of object you want returned: a zoo object, mts object or an Array type object. Zoo and mts are just variants of each other with station data in columns and time in rows. Then I defined a series of functions asZoo(), asMts(), asArray(), which allow you to switch from one representation to another. Then I rewrote all the functions to accept any type of object. This cut down the number of actual functions I need and moved me closer to OOP. A couple of the functions, still accept only one type of data, for example Romans and Taminos, are still “mts/zoo” centric. Some routines disappeared all together: windowV3() is gone and windowArray() takes it place, although you could do test <- asArray(window(asZoo(Data),start =1920, end = 1940 +11/12).
so you can now write anomalize() and pass it a zoo, mts or Array. And passesCam() replaces the several variants. The demos have been rewritten and I added a couple test programs. Building a new package will take time as the manual will require much rewriting.
Then I had another little task. Rewriting Nick’s weights program. The function is used to calculate a series of weights for the data array. These weights are a function of the monthly count of reports in a grid cell and an area weight for that gridcell. My main goal here was to see if I could rewrite the code using raster. Well, I did, but it wasn’t entirely clear and last bits of it were particularly nasty so I decided instead to stick with Nicks code, add some clearer variable names, and change it so that you can use any size grid cell. Here is what the algorithm does: It takes a all the temperatures series and gives them a “cell” id, exactly like the cell value in a raster. Every cell receives a weight that is proportional to its area, basically sin() of Latitude so that cells at the equatoer which are larger have a weight of 1 and cells toward the poles are weighted less. Then a “count” is established for every valid temperature measure in the cell for every month of every year. So a given cell could have 5 reports one month and no reports the next month or 3 or 2, and so forth. Then the area weight is divided by the counts in that cell. And I refer to this as “inverse density” the more reports, the less each report is weighted. The smaller the area, the smaller the weight. As the function stands now, I’ve added a raster as a calling parameter. I’ll probably change that to simply a cell size, as everything I need can be calculated from a cell size. Here is the code. This function, of course, doesnt take a zoo or mts as a parameter, but that’s relatively easy to handle with the asArray() function, so I’ll probably change that as well before adding it.
inverseDensity <- function(inv, Data, r = GLOBE5){ rClass <- class(r)[1] isRL <- rClass == "RasterLayer" if (isInventory(inv) & isArray(Data) & isRL){ # check that inventory stations and Data stations match if (!identical(getStations(inv), getStations(Data))) stop("stations must match") # dimensions of Array dimensions <- dim(Data) # array of weights for output that looks like temperatures weights <- array(NA,dimensions) lonBin <- ncol(r) latBin <- nrow(r) halfLat <- latBin/2 eq <- halfLat * (lonBin +1) + (halfLat+1) cellSize <- res(r)[1] cellId <- floor(inv$Lat/cellSize) * lonBin + floor(inv$Lon/cellSize) + eq #weights for latitude changes latWeight <- sin((1:latBin-0.5)*pi/latBin) #weights for every lat/lon allWeights <- rep(latWeight, each = lonBin) for(months in 1:dimensions[2]){ for(years in 1:dimensions[3]){ # get T/F if temperature is there reports <- !is.na(Data[ , months, years]) # get the cell numbers of those cells with reports cells <- cellId[reports] # a count of reports per cell per month counts <- tabulate(c(cells, lonBin * latBin)) # Ok is the logical of counts > 0 ok <- counts > 0 # make a copy of allweights to modify density <- allWeights # note only "ok" elements are valid density[ok] <- allWeights[ok] / counts[ok] #print(density[ok]) those True cells get a density area/count weights[reports, months, years] <- density[cells] } } return(weights) } if ( !isInventory(inv) ) stop(" must be a valid inventory") if ( !isArray(Data)) stop("must be a valid data array") if ( !isRL ) stop("must be a valid raster") }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.