Predicting Agriculture, Poorly (Part I)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Crop rotation is an agricultural production practice to increase yield, mitigate disease, and control pests. This production practice involves growing crops in specific sequences to improve the quality of soil for the following crop. A common example in the United States is the practice of growing soybeans before corn. In this example soybeans fixate nitrogen in the soil leading to an increase in yield and reduction in fertilizer consumption.
Provided that this production process is prevalent and we have some data regarding where and when particular crops grow, then a stochastic process such as a Markov chain can be used to predict what will be grown where. Luckily, the United States Department of Agriculture’s (USDA) Cropland Data Layer (CDL) can be used to obtain crop location anywhere in the continental United States from 2008-2015. The literature on this subject for major crops including corn, soybeans, winter wheat, cotton and peanuts points towards a high degree of employment of crop rotation across the continental United States. An analysis of the CDL data by (Sahajpal, 2013) showed that 90% of Iowa’s crop production could be represented with three different crop rotations, and 80% of the U.S. crop production could be described with 180 unique rotations. Crop rotation analysis by (Boryan, 2014) examined corn and soybean rotations in the corn belt from 2004 to 2008. This analysis showed that over 45% of the total cultivated acreage in Iowa, 32% of Illinois, and 20% of Nebraska could be explained by corn-to-soy rotations. Further analysis showed that particular rotation patterns are spatially associated with particular regions. An example of this can be seen along the Platte river in Nebraska where 65%, 49%, and 50% of the cultivated acreage within Hall, Chase, and Dawson Counties respectfully grew corn for at least four out of the five years studied, while the statewide cultivated acreage covered by continuous or near-continuous corn cropping was much lower. Similar activity was seen in Iowa and Illinois.
Model Development
In the United States crop rotation can be observed for most major row crops such as corn, soybeans, and wheat. A naive model will be introduced using a simple Markov chain to assist in prediction. Future models in this blog will build on this model, and use this simple model as a benchmark.
A Naive Approach
A stochastic processes is an ordered sequence of random variables. The relationship between these variables can be described by a stochastic model, such models are a typical way to capture changes in states such as crop rotations. A general form of a categorical stochastic model for land use is
Example
The maximum likelihood estimates are simply the observed transitions for each prior state divieded by the total number of observed transitions from the prior state. To calculate these reatios we need to aggregate the year-to-year transition data from the CDL pixels for Iowa. Note that due to resampling, this may take a considerable amount of time (hours) depending on your system. To speed things up the results csv is provided through the website.
Now that you have the data, hopefully from downloading it, we can start working with it:
library(dplyr) library(magrittr) library(tidyr) library(devtools) install_github('jlisic/cdlTools') library(cdlTools) # read in csv file cropChange <- read.csv("Iowa_2008_2015_change.csv") # create a major crop type identifier # preference is given to corn for corn/soy multiple cropping justSoy <- setdiff(soybeans,corn) # Major crop identifier: # 0 non Cultivated # 1 other cultivated # 2 soybeans (but no dual crop soybeans and corn) # 3 corn cropChange$cropTypeFrom <- cropChange$In %in% cultivated cropChange$cropTypeFrom <- cropChange$In %in% justSoy + cropChange$cropTypeFrom cropChange$cropTypeFrom <- 2*cropChange$In %in% corn + cropChange$cropTypeFrom cropChange$cropTypeTo <- cropChange$Out %in% cultivated cropChange$cropTypeTo <- cropChange$Out %in% justSoy + cropChange$cropTypeTo cropChange$cropTypeTo <- 2*cropChange$Out %in% corn + cropChange$cropTypeTo #create a tidy transition matrix K.tidy <- cropChange %>% # get data group_by(cropTypeTo, cropTypeFrom)%>% # group by unique To and From summarize(totalPixels=sum(Pixels)) # sum pixels in each group # create the transition matrix K K <- spread(K.tidy,cropTypeTo,totalPixels) %>% select(2:5) K <- K/rowSums(K) rownames(K) <- colnames(K) <- c('nonCult','cult','soybeans','corn') print(K) ## nonCult cult soybeans corn ## nonCult 0.81769249 0.019130333 0.06618133 0.09699585 ## cult 0.30518990 0.174452328 0.17803400 0.34232377 ## soybeans 0.05949642 0.008826269 0.37963993 0.55203739 ## corn 0.06057745 0.010194854 0.39156959 0.53765810