Gaussian process regression with R

[This article was first published on James Keirstead » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m currently working my way through Rasmussen and Williams’s book on Gaussian processes. It’s another one of those topics that seems to crop up a lot these days, particularly around control strategies for energy systems, and thought I should be able to at least perform basic analyses with this method.

While the book is sensibly laid-out and pretty comprehensive in its choice of topics, it is also a very hard read. My linear algebra may be rusty but I’ve heard some mathematicians describe the conventions used in the book as “an affront to notation”. So just be aware that if you try to work through the book, you will need to be patient. It’s not a cookbook that clearly spells out how to do everything step-by-step.

That said, I have now worked through the basics of Gaussian process regression as described in Chapter 2 and I want to share my code with you here. As always, I’m doing this in R and if you search CRAN, you will find a specific package for Gaussian process regression: gptk. The implementation shown below is much slower than the gptk functions, but by doing things manually I hope you will find it easier to understand what’s actually going on. The full code is given below and is available Github. (PS anyone know how to embed only a few lines from a gist?)

Step 1: Generating functions

With a standard univariate statistical distribution, we draw single values. By contrast, a Gaussian process can be thought of as a distribution of functions. So the first thing we need to do is set up some code that enables us to generate these functions. The code at the bottom shows how to do this and hopefully it is pretty self-explanatory. The result is basically the same as Figure 2.2(a) in Rasmussen and Williams, although with a different random seed and plotting settings.

Example of functions from a Gaussian process

Example of functions from a Gaussian process

Step 2: Fitting the process to noise-free data

Now let’s assume that we have a number of fixed data points. In other words, our Gaussian process is again generating lots of different functions but we know that each draw must pass through some given points. For now, we will assume that these points are perfectly known. In the code, I’ve tried to use variable names that match the notation in the book. In the resulting plot, which corresponds to Figure 2.2(b) in Rasmussen and Williams, we can see the explicit samples from the process, along with the mean function in red, and the constraining data points.

Example of Gaussian process trained on noise-free data

Example of Gaussian process trained on noise-free data

Step 3: Fitting the process to noisy data

The next extension is to assume that the constraining data points are not perfectly known. Instead we assume that they have some amount of normally-distributed noise associated with them. This case is discussed on page 16 of the book, although an explicit plot isn’t shown. The code and resulting plot is shown below; again, we include the individual sampled functions, the mean function, and the data points (this time with error bars to signify 95% confidence intervals).

Example of Gaussian process trained on noisy data

Example of Gaussian process trained on noisy data

Next steps

Hopefully that will give you a starting point for implementating Gaussian process regression in R. There are several further steps that could be taken now including:

  • Changing the squared exponential covariance function to include the signal and noise variance parameters, in addition to the length scale shown here.
  • Speed up the code by using the Cholesky decomposition, as described in Algorithm 2.1 on page 19.
  • Try to implement the same regression using the gptk package. Sadly the documentation is also quite sparse here, but if you look in the source files at the various demo* files, you should be able to figure out what’s going on.


The code

# Demo of Gaussian process regression with R
# James Keirstead
# 5 April 2012
# Chapter 2 of Rasmussen and Williams's book `Gaussian Processes
# for Machine Learning' provides a detailed explanation of the
# math for Gaussian process regression. It doesn't provide
# much in the way of code though. This Gist is a brief demo
# of the basic elements of Gaussian process regression, as
# described on pages 13 to 16.
# Load in the required libraries for data manipulation
# and multivariate normal distribution
require(MASS)
require(plyr)
require(reshape2)
require(ggplot2)
# Set a seed for repeatable plots
set.seed(12345)
# Calculates the covariance matrix sigma using a
# simplified version of the squared exponential function.
#
# Although the nested loops are ugly, I've checked and it's about
# 30% faster than a solution using expand.grid() and apply()
#
# Parameters:
# X1, X2 = vectors
# l = the scale length parameter
# Returns:
# a covariance matrix
calcSigma <- function(X1,X2,l=1) {
Sigma <- matrix(rep(0, length(X1)*length(X2)), nrow=length(X1))
for (i in 1:nrow(Sigma)) {
for (j in 1:ncol(Sigma)) {
Sigma[i,j] <- exp(-0.5*(abs(X1[i]-X2[j])/l)^2)
}
}
return(Sigma)
}
# 1. Plot some sample functions from the Gaussian process
# as shown in Figure 2.2(a)
# Define the points at which we want to define the functions
x.star <- seq(-5,5,len=50)
# Calculate the covariance matrix
sigma <- calcSigma(x.star,x.star)
# Generate a number of functions from the process
n.samples <- 3
values <- matrix(rep(0,length(x.star)*n.samples), ncol=n.samples)
for (i in 1:n.samples) {
# Each column represents a sample from a multivariate normal distribution
# with zero mean and covariance sigma
values[,i] <- mvrnorm(1, rep(0, length(x.star)), sigma)
}
values <- cbind(x=x.star,as.data.frame(values))
values <- melt(values,id="x")
# Plot the result
fig2a <- ggplot(values,aes(x=x,y=value)) +
geom_rect(xmin=-Inf, xmax=Inf, ymin=-2, ymax=2, fill="grey80") +
geom_line(aes(group=variable)) +
theme_bw() +
scale_y_continuous(lim=c(-2.5,2.5), name="output, f(x)") +
xlab("input, x")
# 2. Now let's assume that we have some known data points;
# this is the case of Figure 2.2(b). In the book, the notation 'f'
# is used for f$y below. I've done this to make the ggplot code
# easier later on.
f <- data.frame(x=c(-4,-3,-1,0,2),
y=c(-2,0,1,2,-1))
# Calculate the covariance matrices
# using the same x.star values as above
x <- f$x
k.xx <- calcSigma(x,x)
k.xxs <- calcSigma(x,x.star)
k.xsx <- calcSigma(x.star,x)
k.xsxs <- calcSigma(x.star,x.star)
# These matrix calculations correspond to equation (2.19)
# in the book.
f.star.bar <- k.xsx%*%solve(k.xx)%*%f$y
cov.f.star <- k.xsxs - k.xsx%*%solve(k.xx)%*%k.xxs
# This time we'll plot more samples. We could of course
# simply plot a +/- 2 standard deviation confidence interval
# as in the book but I want to show the samples explicitly here.
n.samples <- 50
values <- matrix(rep(0,length(x.star)*n.samples), ncol=n.samples)
for (i in 1:n.samples) {
values[,i] <- mvrnorm(1, f.star.bar, cov.f.star)
}
values <- cbind(x=x.star,as.data.frame(values))
values <- melt(values,id="x")
# Plot the results including the mean function
# and constraining data points
fig2b <- ggplot(values,aes(x=x,y=value)) +
geom_line(aes(group=variable), colour="grey80") +
geom_line(data=NULL,aes(x=x.star,y=f.star.bar),colour="red", size=1) +
geom_point(data=f,aes(x=x,y=y)) +
theme_bw() +
scale_y_continuous(lim=c(-3,3), name="output, f(x)") +
xlab("input, x")
# 3. Now assume that each of the observed data points have some
# normally-distributed noise.
# The standard deviation of the noise
sigma.n <- 0.1
# Recalculate the mean and covariance functions
f.bar.star <- k.xsx%*%solve(k.xx + sigma.n^2*diag(1, ncol(k.xx)))%*%f$y
cov.f.star <- k.xsxs - k.xsx%*%solve(k.xx + sigma.n^2*diag(1, ncol(k.xx)))%*%k.xxs
# Recalulate the sample functions
values <- matrix(rep(0,length(x.star)*n.samples), ncol=n.samples)
for (i in 1:n.samples) {
values[,i] <- mvrnorm(1, f.bar.star, cov.f.star)
}
values <- cbind(x=x.star,as.data.frame(values))
values <- melt(values,id="x")
# Plot the result, including error bars on the observed points
gg <- ggplot(values, aes(x=x,y=value)) +
geom_line(aes(group=variable), colour="grey80") +
geom_line(data=NULL,aes(x=x.star,y=f.bar.star),colour="red", size=1) +
geom_errorbar(data=f,aes(x=x,y=NULL,ymin=y-2*sigma.n, ymax=y+2*sigma.n), width=0.2) +
geom_point(data=f,aes(x=x,y=y)) +
theme_bw() +
scale_y_continuous(lim=c(-3,3), name="output, f(x)") +
xlab("input, x")

To leave a comment for the author, please follow the link and comment on their blog: James Keirstead » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)