Site icon R-bloggers

arbitrary distributions with set correlation

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A question recently posted on X Validated by Antoni Parrelada: given two arbitrary cdfs F and G, how can we simulate a pair (X,Y) with marginals  F and G, and with set correlation ρ? The answer posted by Antoni Parrelada was to reproduce the Gaussian copula solution: produce (X’,Y’) as a Gaussian bivariate vector with correlation ρ and then turn it into (X,Y)=(F⁻¹(Φ(X’)),G⁻¹(Φ(Y’))). Unfortunately, this does not work, because the correlation does not keep under the double transform. The graph above is part of my answer for a χ² and a log-Normal cdf for F amd G: while corr(X’,Y’)=ρ, corr(X,Y) drifts quite a  lot from the diagonal! Actually, by playing long enough with my function

tacor=function(rho=0,nsim=1e4,fx=qnorm,fy=qnorm)
{
  x1=rnorm(nsim);x2=rnorm(nsim)
  coeur=rho
  rho2=sqrt(1-rho^2)
  for (t in 1:length(rho)){
     y=pnorm(cbind(x1,rho[t]*x1+rho2[t]*x2))
     coeur[t]=cor(fx(y[,1]),fy(y[,2]))}
  return(coeur)
}

Playing further, I managed to get an almost flat correlation graph for the admittedly convoluted call

tacor(seq(-1,1,.01),
      fx=function(x) qchisq(x^59,df=.01),
      fy=function(x) qlogis(x^59))

Now, the most interesting question is how to produce correlated simulations. A pedestrian way is to start with a copula, e.g. the above Gaussian copula, and to twist the correlation coefficient ρ of the copula until the desired correlation is attained for the transformed pair. That is, to draw the above curve and invert it. (Note that, as clearly exhibited by the graph just above, all desired correlations cannot be achieved for arbitrary cdfs F and G.) This is however very pedestrian and I wonder whether or not there is a generic and somewhat automated solution…


Filed under: Books, Kids, pictures, R, Statistics, University life Tagged: chi-square density, copula, correlation, cross validated, inverse cdf, logistic distribution, Monte Carlo Statistical Methods, quantile function, R, random number generation, simulation

To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.