Scoping functions in R
[This article was first published on Data, Evidence, and Policy - Jared Knowles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I want to test embedding source code in the blog by using the handy Gist tool provided by GitHub. These two R functions are a good opportunity to test out embedding a Gist on the website. These functions allow for threshold testing within a vector in R, or over rows or columns of a dataframe as well as is shown at the end of the code. They are not complex, and probably not as efficient as they could be, but it is an example of writing readable and well-documented code. And, it may be of use to others. Eventually these will be incorporated into the LDS_TOOLS package, which you can find on GitHub currently under development.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
####################################### | |
# Example data | |
###################################### | |
testdata<-replicate(10, rpois(100, 20)) | |
########################################### | |
# This function tells us how far we have to | |
# go before reaching a cutoff in a variable | |
# by sorting the vector, then finding how far | |
# to go. Note that the cutoff is expressed in | |
# percentage terms | |
############################################ | |
fixcumsum<-function(x,cutoff){ | |
x<-x[order(-x)] #sort vector descending | |
xb<-cumsum(x) # take cumulative sum | |
xc<-xb/sum(x,na.rm=T) #express proportionally | |
length(xc[xc<cutoff]) #count number of items until | |
# threshhold is exceeded | |
} | |
########################################## | |
# This function allows us to see what % | |
# of observations are found at a given threshold | |
# again expressed in percentage terms. | |
############################################ | |
cutoff<-function(x,thresh){ | |
#x is the column or variable | |
#thresh is the number to count to | |
x<-x[order(-x)] # sort vector descending | |
xb<-cumsum(x) # take cumulative sum (order matter) | |
xc<-xb/sum(x,na.rm=T) # express proportionally | |
xc[thresh] # report cumulative percentage at given threshold | |
} | |
############################### | |
# Now we have to simply apply | |
# over a data element such as a | |
# matrix or a dataframe | |
############################### | |
#here we apply the object to columns we specify that apply | |
#(data, 2, function, function var) | |
#where 2 tells R to go column-wise, which is appropriate for | |
# this data shape, and we specify function parameters after we | |
# tell R the function | |
thresh3<-apply(testdata,2,cutoff,thresh=3) | |
thresh5<-apply(testdata,2,cutoff,thresh=5) | |
thresh10<-apply(testdata,2,cutoff,thresh=10) | |
# we store these as variables because a vector is produced that | |
# shows us the value for each function that was applied to | |
cutoffs<-cbind(thresh3,thresh5,thresh10) | |
# We combine these into a data frame | |
# We make them into % for Excel purposes (not necessary) | |
cutoffs<-cutoffs*100 | |
To leave a comment for the author, please follow the link and comment on their blog: Data, Evidence, and Policy - Jared Knowles.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.