Simply creating various scatter plots with ggplot #rstats
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Inspired by these two postings, I thought about including a function in my package for simply creating scatter plots.
In my package, there’s a function called sjp.scatter
for creating scatter plots. To reproduce these examples, first load the package and then attach the sample data set:
data(efc)
The simplest function call is by just providing two variables, one for the x- and one for the y-axis:
sjp.scatter(efc$c160age, efc$e17age)
If you have continuous variables with a larger scale, you shouldn’t have problems with overplotting or overlaying dots. However, this problem usually occurs, if you have variables with just a few categories (factor levels). The function automatically estimates the amount of overlaying dots and then automatically jitters them, like in following example, which also includes a marginal rug-plot:
sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code, showRug=TRUE)
The same plot, when auto-jittering is turned off, would look like this:
sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code, showRug=TRUE, autojitter=FALSE)
You can also add a grouping variable. The scatter plot is then “divided” into as many groups as indicated by the grouping variable. In the next example, two variables (elder’s and carer’s age) are grouped by different dependency levels of the elderly. Additionally, a fitted line for each group is plotted:
sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot", legendTitle=sji.getVariableLabels(efc)['e42dep'], legendLabels=sji.getValueLabels(efc)[['e42dep']], axisTitle.x=sji.getVariableLabels(efc)['c160age'], axisTitle.y=sji.getVariableLabels(efc)['e17age'], showGroupFitLine=TRUE)
If the groups are difficult to distinguish in a single plot area, the graph can be faceted by groups. This is shown in the last example, where a scatter plot is plotted for each group:
sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot", legendTitle=sji.getVariableLabels(efc)['e42dep'], legendLabels=sji.getValueLabels(efc)[['e42dep']], axisTitle.x=sji.getVariableLabels(efc)['c160age'], axisTitle.y=sji.getVariableLabels(efc)['e17age'], showGroupFitLine=TRUE, useFacetGrid=TRUE, showSE=TRUE)
Find a complete overview of the various function options in the package-help or at inside-r.
Tagged: ggplot, R, rstats
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.