Scoping functions in R

[This article was first published on Data, Evidence, and Policy - Jared Knowles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I want to test embedding source code in the blog by using the handy Gist tool provided by GitHub. These two R functions are a good opportunity to test out embedding a Gist on the website. These functions allow for threshold testing within a vector in R, or over rows or columns of a dataframe as well as is shown at the end of the code. They are not complex, and probably not as efficient as they could be, but it is an example of writing readable and well-documented code. And, it may be of use to others. Eventually these will be incorporated into the LDS_TOOLS package, which you can find on GitHub currently under development.

#######################################
# Example data
######################################
testdata<-replicate(10, rpois(100, 20))
###########################################
# This function tells us how far we have to
# go before reaching a cutoff in a variable
# by sorting the vector, then finding how far
# to go. Note that the cutoff is expressed in
# percentage terms
############################################
fixcumsum<-function(x,cutoff){
x<-x[order(-x)] #sort vector descending
xb<-cumsum(x) # take cumulative sum
xc<-xb/sum(x,na.rm=T) #express proportionally
length(xc[xc<cutoff]) #count number of items until
# threshhold is exceeded
}
##########################################
# This function allows us to see what %
# of observations are found at a given threshold
# again expressed in percentage terms.
############################################
cutoff<-function(x,thresh){
#x is the column or variable
#thresh is the number to count to
x<-x[order(-x)] # sort vector descending
xb<-cumsum(x) # take cumulative sum (order matter)
xc<-xb/sum(x,na.rm=T) # express proportionally
xc[thresh] # report cumulative percentage at given threshold
}
###############################
# Now we have to simply apply
# over a data element such as a
# matrix or a dataframe
###############################
#here we apply the object to columns we specify that apply
#(data, 2, function, function var)
#where 2 tells R to go column-wise, which is appropriate for
# this data shape, and we specify function parameters after we
# tell R the function
thresh3<-apply(testdata,2,cutoff,thresh=3)
thresh5<-apply(testdata,2,cutoff,thresh=5)
thresh10<-apply(testdata,2,cutoff,thresh=10)
# we store these as variables because a vector is produced that
# shows us the value for each function that was applied to
cutoffs<-cbind(thresh3,thresh5,thresh10)
# We combine these into a data frame
# We make them into % for Excel purposes (not necessary)
cutoffs<-cutoffs*100

To leave a comment for the author, please follow the link and comment on their blog: Data, Evidence, and Policy - Jared Knowles.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)