Site icon R-bloggers

Quick scatterplot with associated histograms

[This article was first published on Gosset's student » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R can produce some beautiful graphics, and there are some excellent packages, such as lattice and ggplot2 to represent data in original ways.  But sometimes, all you want to do is explore the realtionship between pairs of variables with the minimum of fuss.

In this post we’ll use the data which we imported in the previous post to make a quick graphic.  I’ll assume you already got as far as importing the data and placing the variable for NO concentration into x and ozone into y.

We’re going to make a scatterplot with the histogram of x below the x axis, and the histogram of y rotated anti-clockwise through 90 degrees and alongside the y axis (all will become clear).  The first thing is to set up the graphics display:

## start by saving the original graphical parameters
def.par <- par(no.readonly = TRUE)
## then change the margins around each plot to 1
par("mar" = c(1,1,1,1))
## then set the layout of the graphic
layout(matrix(c(2,1,1,2,1,1,4,3,3), 3, 3, byrow = TRUE))

The layout command tells R to split the graphical output into a 3 by 3 array of panels. Each panel is given a number corresponding to the order in which graphics are plotted into it. To see this array, type:

matrix(c(2,1,1,2,1,1,4,3,3), 3, 3, byrow = TRUE)

This output shows that the display is split into 4 zones. The top right is a large area for plot one, the top left is a smaller panel for plot 2, and the bottom right is for plot 3.

So then, we need something for the top right – a straight forward scatter plot of x vs y (we set the maximum for the x axis with the xlim parameter of plot and using the maxx variable, which contains the maximum value held in the vector:

maxx <- x[which.max(x)]
maxy <- y[which.max(y)]
plot(x, y, xlab = "", ylab = "", pch = 20, bty = "n", 
   xlim = c(0, maxx), ylim = c(0,maxy))

Then, we need to create a histogram of the y values, and plot it to the left of the histogram appropriately orientated. To do this we first store a histogram into the variable yh, and then plot it with the barplot command. The reason for this is that barplots can be easily rotated:

breaks <- 50
yh <- hist(y, breaks = (maxy/breaks)*(0:breaks), plot = FALSE)
barplot(-(yh$intensities),space=0,horiz=T, axes = FALSE)

The breaks variable stores the number of bins into which the histogram is divided, maxy is the maximum value for the vector y, yh is the histogram, and then barplot extracts the heights of the bars from the histogram object draws it as a bar chart, but flips it on its side. The negative sign before yh$intensities points the bars to the left rather than the right.
We do the same for the x values, and also then reset the graphics display to defaults.

xh <- hist(x, breaks = (maxx/breaks)*(0:breaks), plot = FALSE)
barplot(-(xh$intensities),space=0,horiz=F, axes = FALSE)
## reset the graphics display to default
par(def.par)

We get this output:

The advantage of this over the straight scatterplot is that you can see the density of overlapping points on the histogram. I’ve set the number of bins in the histogram to 50 – it’s worth playing around with this with your data. There are more elegant ways of doing this, but if you have paired variables x and y, and you want to quickly look at their distributions and association, this code works fine.


Tagged: R, statistics

To leave a comment for the author, please follow the link and comment on their blog: Gosset's student » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.