Simple ROC plots with ggplot2 – Part 2

Kate Nambiar

10 years ago

[This article was first published on [R] Randomness » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the first part of this article we built a function (rocdata) to calculate the co-ordinates for the ROC plot and its summary statistics. Now we need to actually produce the plot. I make most of my plots in ggplot2 because of it’s versatility. However there’s no reason why these plots couldn’t be produced using R base graphics. The first example is a function for producing a simple ROC plot just looking at one test on one set of data.

rocplot.single <- function(grp, pred, title = "ROC Plot", p.value = FALSE){
  require(ggplot2)
  plotdata <- rocdata(grp, pred)
 
  if (p.value == TRUE){
    annotation <- with(plotdata$stats, paste("AUC=",signif(auc, 2), " (P=", signif(p.value, 2), ")", sep=""))
  } else {
    annotation <- with(plotdata$stats, paste("AUC=",signif(auc, 2), " (95%CI ", signif(ci.upper, 2), " - ", signif(ci.lower, 2), ")", sep=""))
  }
 
  p <- ggplot(plotdata$roc, aes(x = x, y = y)) +
      geom_line(aes(colour = "")) +
      geom_abline (intercept = 0, slope = 1) +
      theme_bw() +
      scale_x_continuous("False Positive Rate (1-Specificity)") +
      scale_y_continuous("True Positive Rate (Sensitivity)") +
      scale_colour_manual(labels = annotation, values = "#000000") +
      opts(title = title,
           plot.title = theme_text(face="bold", size=14), 
           axis.title.x = theme_text(face="bold", size=12),
           axis.title.y = theme_text(face="bold", size=12, angle=90),
           panel.grid.major = theme_blank(),
           panel.grid.minor = theme_blank(),
           legend.justification=c(1,0), 
           legend.position=c(1,0),
           legend.title=theme_blank(),
           legend.key = theme_blank()
           )
  return(p)
}

Let’s make an example dataset and apply this function to it. The following function generates some random data in two groups.

exampledata <- function(n = 50, sd = 1) {
  # Control group
  ctrl <- data.frame(grp = rep("Control", n),
                    res = -1 + rnorm(n, mean = 0, sd = sd)
                    )
  # Disease group
  dis <- data.frame(grp = rep("Disease", n),
                    res = 1 + rnorm(n, mean = 0, sd = sd)
                    )
  df <- rbind(ctrl,dis)
  return(df)
}

We can then take the data frame that this produces and pass it to the rocplot.simple function. It processes the data through our rocdata function from part 1 and then passes the x and y co-ordinates to ggplot2. I’ve added a line of non-discrimination (geom_abline) and used the legend to display the statistics (AUC and confidence intervals). In order to get the legend to display for a single plot line I’ve added an aes colour mapping to geom_line and then used the labels argument of scale_colour_manual to add the annotation to the plot. The annotation can then be customised to display whatever I need – in this case either the AUC and CI or the AUC and P value.

Test1 <- exampledata()
p <- rocplot.single(Test1$grp, Test1$res, title = "")
print(p)

Let’s try a slightly more complicated situation. How about a hypothetical situation where you apply the same test in a set of different situations. How does the test perform? Well, let’s make up 3 test scenarios and build up 3 sets of test data. We can then put those together into a list and apply our function. Because we’re dealing with a list of data objects our rocplot function is slightly different:

rocplot.multiple <- function(test.data.list, groupName = "grp", predName = "res", title = "ROC Plot", p.value = TRUE){
  require(plyr)
  require(ggplot2)
  plotdata <- llply(test.data.list, function(x) with(x, rocdata(grp = eval(parse(text = groupName)), pred = eval(parse(text = predName)))))
  plotdata <- list(roc = ldply(plotdata, function(x) x$roc),
                   stats = ldply(plotdata, function(x) x$stats)
                   )
 
  if (p.value == TRUE){
    annotation <- with(plotdata$stats, paste("AUC=",signif(auc, 2), " (P=", signif(p.value, 2), ")", sep=""))
  } else {
    annotation <- with(plotdata$stats, paste("AUC=",signif(auc, 2), " (95%CI ", signif(ci.upper, 2), " - ", signif(ci.lower, 2), ")", sep=""))
  }
 
  p <- ggplot(plotdata$roc, aes(x = x, y = y)) +
       geom_line(aes(colour = .id)) +
       geom_abline (intercept = 0, slope = 1) +
       theme_bw() +
       scale_x_continuous("False Positive Rate (1-Specificity)") +
       scale_y_continuous("True Positive Rate (Sensitivity)") +
       scale_colour_brewer(palette="Set1", breaks = names(test.data.list), labels = paste(names(test.data.list), ": ", annotation, sep = "")) +
       opts(title = title,
            plot.title = theme_text(face="bold", size=14), 
            axis.title.x = theme_text(face="bold", size=12),
            axis.title.y = theme_text(face="bold", size=12, angle=90),
            panel.grid.major = theme_blank(),
            panel.grid.minor = theme_blank(),
            legend.justification=c(1,0), 
            legend.position=c(1,0),
            legend.title=theme_blank(),
            legend.key = theme_blank()
            )
  return(p)
}

We can generate some test data like this

TestData1 <- list(TrialA = exampledata(n=50, sd = 1), 
                  TrialB = exampledata(n=50, sd = 1.5),
                  TrialC = exampledata(n=50, sd = 2)
                  )

Each data frame containing our data is then put together into a list object which we pass to our rocplot.multiple function.

p <- rocplot.multiple(TestData1, title = "", p.value = FALSE)
print(p)

In addition to ggplot2 this function makes use of the excelent tools available from the plyr library also written by Hadley Wickham. The first step is to apply our rocdata function to each of the 3 data frames in our list using the llply function (equivalent to lapply in R base). We then put together the roc elements and the stats elements from each rocdata output by using the ldply function on each of them. This gives us 2 dataframes with an .id column corresponding to the name of the original data frame it came from. We can put those together into a list so just like our first example we supply ggplot with a list with all the ROC co-ordinates in one element and all the stats in the second. We can then plot one line for each set of data defined by the .id column. Like the first example I’ve used the the legend to display our summary statistics.

Code tested using R version 2.14.2 with the following packages:
plyr – version 1.7.1
ggplot2 – version 0.9.0

To leave a comment for the author, please follow the link and comment on their blog: [R] Randomness » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.