Is rainfall reporting endogenous to conflict?

[This article was first published on fReigeist » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For my paper on the impact of social insurance on the dynamics of conflict in India, I use some new remote sensed weather data. The data comes from the Tropical Rainfall Measuring Mission (TRMM) satellites. The satellite carries a set of five instruments, and is essentially a rainfall radar located in outer space.

As a robustness check I needed to verify that my main results go through using other rainfall data. In the paper I try to make a humble case in favour of using remote sensed data where possible. The key reason being that the  TRMM data comes from the same set of instruments over time, rather than from input sources that could be varying with e.g., economic conditions. This is a problem that has been identified by climatologist, who try to correct for systematic biases that could arise from the fact that weather stations are more likely to be located in places with a lot of economic activity.

At first I was a bit reluctant as it is quite heavy data that needs to be processed. Nevertheless, thorough analysis required me to jump the hoop and obtain secondary rainfall data sources. I chose the GPCC monthly rainfall data for verification of my results, since these have been used by many other authors in the past in similar contexts. The data is based on rain gauge measurements and is available for the past 100 years.

The raw data is quite heavy ; the monthly rainfall rate data  for the whole world at at 0.5 degree resolution would amount to about 150 million rows of data for the period from 1961-2010. If you drop the non-land grid cells, this reduces the size dramatically to only 40 million rows. Below is a bit of code that   loads in the data once you have downloaded the ASCII source files from the GPCC website. On my personal website, I make a dta and an rdata file available for the whole world. There are three variables appearing in that order: (1) the rainfall rate, (2) the rainfall normals and (3) an integer that gives the number of reporting rain gauges that fall in a grid cell in a particular month.

It turns out that all my results are robust to using this data. However, I do find something that is quite neat. It turns out that, if a district experienced some insurgency related conflict in the previous year,  it is less likely that this district has an active rain gauge reporting data in subsequent years. While it is a no-brainer that places with severe conflict do not have functioning weather reporting, these results suggest  that reporting may also be systematically affected in places with relatively low intensity of conflict – as is the case of India.

While I do not want to overstate the importance of this, it provides another justification of why it makes sense for economists to be using  remotely sensed weather data. This is not to say that ground based data is not useful. Quite the reverse, ground based data is more accurate in many ways, which makes it very important for climatologist. As economist, we are worried about systematic measurement error that correlates with the economic variables we are studying. This is were remote sensed data provides advantages as it does not “decide” to become less accurate in places that are e.g. less developed, suffer from conflict or simply, have nobody living there.

Here the function to read in the data and match to district centroids, you need some packages.

#########LOAD GPPC NOTE THAT YOU NEED TO SUBSET THE DATA IF YOU DONT WANT TO END UP WITH A HUGE DATA OBJECT

loadGPCC<-function(ff, COORDS) {
yr<-as.numeric(gsub("(.*)\\_([0-9]{2})([0-9]{4})","\\3",ff))
month<-as.numeric(gsub("(.*)\\_([0-9]{2})([0-9]{4})","\\2",ff))
temp<-data.table(data.frame(cbind(COORDS,read.table(file=paste("Rainfall/gpcc_full_data_archive_v006_05_degree_2001_2010/",ff,sep=""), header=FALSE, skip=14))))

###YOU COULD SUBSET THE DATA BY EXTENT HERE IF YOU DONT WANT TO GET IT FOR THE WHOLE WORLD
##E.G. SUBSET FOR BY BOUNDING BOX
##temp<-temp[x>=73 & x<=136 & y>=16 & y<=54]

temp<-cbind("year"= yr, "month"=month, temp)
gc("free")
temp
}

################
#####
ffs<-list.files("Rainfall/gpcc_full_data_archive_v006_05_degree_2001_2010")

###THIS DEFINES THE GRID STRUCTURE OF THE DATA
###YOU MAY NEED TO ADJUST IF YOU WORK WITH A COARSER GRID
xs=seq(-179.75,179.75,.5)
ys=seq(89.75,-89.75,-.5)
COORDS<-do.call("rbind", lapply(ys, function(x) cbind("x"=xs,"y"=x)))

system.time(GPCC<-do.call("rbind", lapply(1:length(ffs), function(x) loadGPCC(ffs[x], COORDS))))

###MATCHING THIS TO SHAPEFILE?
##YOU COULD MATCH CENTROIDS OF DISTRICTS TO THE NEAREST GRID CELL - THE FOLLOWING FUNCTION WOULD DO THAT

###find nearest lat / lon pair
##you may want to vectorise this
NEAREST<-NULL
for(k in 1:nrow(CENTROIDS)) {
cat(k," ")
temp<-distHaversine(CENTROIDS[k,c("x","y"),with=F], GPCC.coords[, c("delx","dely"), with=F])
NEAREST<-rbind(NEAREST, cbind(CENTROIDS[k],GPCC.coords[which(temp==min(temp))]))

}

 

To leave a comment for the author, please follow the link and comment on their blog: fReigeist » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)