RObservations #3- Finding the Expected value of the maximum of two Bivariate Normal variables with simulation

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Finding the expected value of order statistics can be a challenge if one is not privy to certain mathematical “tricks” and techniques. Thankfully, by use of simulation we can have some guidance that can help us realize when we are wrong with our mathematical conclusions and when we are correct.

This post was inspired by an assignment question I had in my mathematical statistics class and how we were able to catch our miscalculations by simulating the distribution. I am very grateful to my colleagues who showed me this technique – I see that this will be useful in future applications!

The Question

(X_1,X_2)^\tau has a bivariate normal distribution, with E(X_1)=E(X_2)=0, and Var(X_1)=Var(X_2)=1, and Cov(X_1,X_2)=\rho. Find E(max\{X_1,X_2\}).

Looking back at this question now, it looks quite trivial. But when we were trying to solve it- we were consistently (and wrongly) concluding that E(max\{X_1,X_2\})=0 until we turned to a tried and true tool for studying distributions- simulation

Using the {mvtnorm} package

Thankfully, the {mvtnorm} package has been around for awhile and specializes in simulating multivariate normal distributions. With this package, we can get away with simulating the bivariate normal distribution and examine its relationship with the correlation coefficient, \rho, as it increases.

# Simulating the Expected value of E(max{X,Y}) where (X,Y)~ Bivariate Normal Density 
# as the Correlation Coefficent increases
library(mvtnorm)
set.seed(1234) 

# #The Correlation Coefficient test values

rho <- seq(-1, 1, by=0.1)
# Sample size

n <-100000
resMax <- sapply(rho, function(t) {
  x <-
    rmvnorm(n, sigma = matrix(
      data = c(1, t, t, 1),
      nrow = 2,
      ncol = 2,
      byrow = TRUE
    ))
  z <- mapply(function(u, v) {
    max(u, v)
  }, u = x[, 1], v = x[, 2])
  mean(z)
})

Plotting the data reveals that our expected value takes a range of values depending on \rho!

With this in mind, we were able to determine by revaluating the problem that E(max\{X_1,X_2\}) = \sqrt{\frac{1-\rho}{\pi}}

We are able to confirm this by plotting the true expected value against our simulated data.

trueVals <- sapply(rho, function(t) {
  x <- sqrt((1 - t) / pi)
})

And there you have it!

Conclusion

When working through symbolic calculations one can be prone to running into errors. Simulation is a valuable tool in this setting as it helps provide insight to where the true answer is- especially when the underlying distribution is known.

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)