Boxplots and Day of Week Effects
[This article was first published on MarginTale, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
THIS BLOG DOES NOT CONSTITUTE INVESTMENT ADVICE. ACTING ON IT WILL MOST LIKELY BE DETRIMENTAL TO YOUR FINANCIAL HEALTH.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After following some R-related quant finance blogs like Timely Portfolio, Systematic Investor or Quantitative thoughts– to name some of my favourites – I decided to start my own. I’ll first focus on R snippets which come in handy, and will potentially expand to quant trading and backtesting as time allows.
I’ll start with a simple graphical boxplot analysis of “days of the week effects” with two R snippet/tidbits regarding:
- How do you adapt the ggplot2 plotting of boxplots to a mundane 50%-box 95%-line 5%-dots view?
- How do you subdivide your days in weekdays easily and robustly?
Lets jump directly into the code which can be downloaded at https://gist.github.com/1974563:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require(quantmod) | |
require(ggplot2) | |
require(reshape2) | |
# The standard definitions of boxplots are non-obvious to interpret for non-statisticians. | |
# A "the box is fifty percent, the line 95% and there you have 5% outlier points" is | |
# typically more easily swallowed by practitioners. | |
# I therefore define two functions which will change the boxplot appearance below. | |
myBoxPlotSummary <- function(x) { | |
r <- quantile(x, probs = c(0.025, 0.25, 0.5, 0.75, 0.975),na.rm=TRUE) | |
names(r) <- c("ymin", "lower", "middle", "upper", "ymax") | |
r | |
} | |
myBoxPlotOutliers <- function(x) { | |
tmp<-quantile(x,probs=c(.025,.975),na.rm=TRUE) | |
subset(x, x < tmp[1] | tmp[2] < x) | |
} | |
# Download some Data, e.g. the CBOE VIX | |
getSymbols("^VIX",src="yahoo") | |
# Make a factor depending on the day of week. We will use this to segement data according to days of the week. | |
wd<-factor(.indexwday(VIX),levels=1:5,labels=c("Mon","Tue","Wed","Thu","Fri"),ordered=TRUE) | |
# Note here that I do not use the weekdays function, because this will be locale dependent and lead | |
# to an unwanted sorting of the days in the boxplot | |
wd<-factor(.indexwday(VIX),levels=1:5,labels=c("Mon","Tue","Wed","Thu","Fri"),ordered=TRUE) | |
# wd<-factor(.indexwday(VIX),levels=1:7,labels=c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"),ordered=TRUE) | |
# a dataframe with the factor and the daily returns from close to close | |
tail(mydf<-data.frame(wd=wd,ROC(Cl(VIX)))) | |
mdat<- melt(mydf) | |
# plot the boxplots with own summary functions and outliers | |
ggplot(mdat,aes(wd,value)) + | |
opts(title = "Daily returns of the VIX") + xlab("") + ylab("% per day") + | |
stat_summary(fun.data=myBoxPlotSummary, geom="boxplot") + | |
stat_summary(fun.y = myBoxPlotOutliers, geom="point") | |
#kruskal.test(x=mydf[,2],g=mydf[,1]) |
![]() |
From MarginTale |
These boxplots now show 50% of the observations in the box, the vertical lines cover 95% and the dots 2.5%. I find this easier to communicate than the standard definition. This is implemented in the functions myBoxPlotSummary and myBoxPlotOutliers which are in turn called from stat_summary in ggplot.
A second issue I tripped over is the sorting of days in the above boxplot. If one uses the obvious way and just defines a factor as “weekdays(index(…))” then the plot function will alphabetically sort the days – not exactly what you want. If you then try to order the factors, your solution will depend on how locale (the language you use) specifies the abbreviations of the weekdays. A robust solution shown in the code is to use the function .indexwday from the package xts.
To leave a comment for the author, please follow the link and comment on their blog: MarginTale.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.