Boxplots and Day of Week Effects

[This article was first published on MarginTale, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

THIS BLOG DOES NOT CONSTITUTE INVESTMENT ADVICE. ACTING ON IT WILL MOST LIKELY BE DETRIMENTAL TO YOUR FINANCIAL HEALTH.

After following some R-related quant finance blogs like Timely PortfolioSystematic Investor or Quantitative thoughts–  to name some of my favourites – I decided to start my own. I’ll first focus on R snippets which come in handy, and will potentially expand to quant trading and backtesting as time allows.

I’ll start with a simple graphical boxplot analysis of “days of the week effects” with two R snippet/tidbits regarding:
  1. How do you adapt the ggplot2 plotting of boxplots to a mundane 50%-box 95%-line 5%-dots view?
  2. How do you subdivide your days in weekdays easily and robustly? 

Lets jump directly into the code which can be downloaded at https://gist.github.com/1974563:


require(quantmod)
require(ggplot2)
require(reshape2)
# The standard definitions of boxplots are non-obvious to interpret for non-statisticians.
# A "the box is fifty percent, the line 95% and there you have 5% outlier points" is
# typically more easily swallowed by practitioners.
# I therefore define two functions which will change the boxplot appearance below.
myBoxPlotSummary <- function(x) {
r <- quantile(x, probs = c(0.025, 0.25, 0.5, 0.75, 0.975),na.rm=TRUE)
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
myBoxPlotOutliers <- function(x) {
tmp<-quantile(x,probs=c(.025,.975),na.rm=TRUE)
subset(x, x < tmp[1] | tmp[2] < x)
}
# Download some Data, e.g. the CBOE VIX
getSymbols("^VIX",src="yahoo")
# Make a factor depending on the day of week. We will use this to segement data according to days of the week.
wd<-factor(.indexwday(VIX),levels=1:5,labels=c("Mon","Tue","Wed","Thu","Fri"),ordered=TRUE)
# Note here that I do not use the weekdays function, because this will be locale dependent and lead
# to an unwanted sorting of the days in the boxplot
wd<-factor(.indexwday(VIX),levels=1:5,labels=c("Mon","Tue","Wed","Thu","Fri"),ordered=TRUE)
# wd<-factor(.indexwday(VIX),levels=1:7,labels=c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"),ordered=TRUE)
# a dataframe with the factor and the daily returns from close to close
tail(mydf<-data.frame(wd=wd,ROC(Cl(VIX))))
mdat<- melt(mydf)
# plot the boxplots with own summary functions and outliers
ggplot(mdat,aes(wd,value)) +
opts(title = "Daily returns of the VIX") + xlab("") + ylab("% per day") +
stat_summary(fun.data=myBoxPlotSummary, geom="boxplot") +
stat_summary(fun.y = myBoxPlotOutliers, geom="point")
#kruskal.test(x=mydf[,2],g=mydf[,1])
Running the code, we get following output:
From MarginTale


These boxplots now show 50% of the observations in the box, the vertical lines cover 95% and the dots 2.5%. I find this easier to communicate than the standard definition. This is implemented in the functions myBoxPlotSummary and myBoxPlotOutliers which are in turn called from stat_summary in ggplot.

A second issue I tripped over is the sorting of days in the above boxplot. If one uses the obvious way and just defines a factor as “weekdays(index(…))” then the plot function will alphabetically sort the days – not exactly what you want. If you then try to order the factors, your solution will depend on how locale (the language you use) specifies the abbreviations of the weekdays. A robust solution shown  in the code is to use the function .indexwday from the package xts.

To leave a comment for the author, please follow the link and comment on their blog: MarginTale.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)