Q is for qplot
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
But sometimes you don’t need high-quality, publication-ready. Sometimes you just need a quick look at the data and you don’t care if you have axis labels or centered titles. You just need to make certain there isn’t anything wonky about your data as you clean and/or analyze. Fortunately, ggplot2 has a great function for that – qplot (or quick plot).
As with ggplot, qplot has a standard function and set of arguments, so once you learn to do it for one type of graphic, you can easily expand to others. And qplot has some smart rules built in to default to two of the most frequently used charts (particularly for quick looks at the data): histograms and scatterplots. Why are these most frequently used, especially in cleaning and early stages of analysis? A histogram lets you see if your variable is approximately normal; this is important because many statistical tests (and most of them you would have learned in an Introductory Statistics course) are built on the assumption that data are normally distributed. A scatterplot lets you see if your variables are related to each other, and whether that relationship is linear or not; once again, many statistical tests are built on assumptions about linear relationships between variables. So it makes sense that, if you’re taking a quick look, you’ll probably be using one of these two graphics.
The default graphics are very easy to produce: if you give only an x variable, you’ll get a histogram, and if you give both x and y, you’ll get a scatterplot. I’ll use the Facebook data once again to demonstrate. I also went ahead and scored the RRS and SBI (described below) here – you can find code for scoring all measures here.
Facebook<-read.delim(file="small_facebook_set.txt", header=TRUE) Facebook$RRS<-rowSums(Facebook[,3:24]) reverse<-function(max,min,x) { y<-(max+min)-x return(y) } Facebook$Sav2R<-reverse(7,1,Facebook$Sav2) Facebook$Sav4R<-reverse(7,1,Facebook$Sav4) Facebook$Sav6R<-reverse(7,1,Facebook$Sav6) Facebook$Sav8R<-reverse(7,1,Facebook$Sav8) Facebook$Sav10R<-reverse(7,1,Facebook$Sav10) Facebook$Sav12R<-reverse(7,1,Facebook$Sav12) Facebook$Sav14R<-reverse(7,1,Facebook$Sav14) Facebook$Sav16R<-reverse(7,1,Facebook$Sav16) Facebook$Sav18R<-reverse(7,1,Facebook$Sav18) Facebook$Sav20R<-reverse(7,1,Facebook$Sav20) Facebook$Sav22R<-reverse(7,1,Facebook$Sav22) Facebook$Sav24R<-reverse(7,1,Facebook$Sav24) Facebook$SBI<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+ Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+ Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+ Facebook$Sav24R+Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+ Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+ Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23 library(ggplot2)
I'll use a scale I haven't really used in this series - the Savoring Beliefs Inventory. This measure was created by Fred Bryant, who was my faculty sponsor for this research (since I was still a grad student at the time). Fred also taught me structural equation modeling. The measure assesses a concept Fred calls savoring - fixating on positive events and feelings to retain those feelings of joy and pleasure. I selected this measure to include because, as I mentioned to Fred, I felt savoring was the opposite of rumination. (While he thought I'd made a good point, he told me he thought of savoring as the opposite of coping, which makes sense.)
Using the qplot function, we can quickly generate a histogram with total SBI score.
qplot(SBI, data=Facebook) ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This variable shows a negative skew: there is a long tail (fewer cases than we'd expect if this followed the normal distribution) at the low end, the highest part of the distribution is to the right of center, and there is much less of a tail at the high end (more cases than we'd expect if this followed the normal distribution). We're also getting a message about bins. Right now, the histogram is slicing up the values between the minimum and the maximum into 30 bars. We can reduce this number to smooth out the distribution.
qplot(SBI, data=Facebook, bins=15)