Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Everyday, a poor soul tries to understand copulas by reading the corresponding Wikipedia page, and gives up in despair. The incomprehensible mess that one finds there gives the impression that copulas are about as accessible as tensor theory, which is a shame, because they are actually a very nice tool. The only prerequisite is knowing the inverse cumulative function trick.
That trick runs as follows: suppose you want to generate samples from some distribution with probability density
has the right probability density. This is easy to prove using the classical transformation formula for random variables.
The trick also works in the other direction: if you take
Now let’s say that what you want to generate two correlated random variables
Here’s a possible recipe: generate
Finally transform again to
Here’s some R code that illustrates this:
require(mvtnorm) S <- matrix(c(1,.8,.8,1),2,2) #Correlation matrix AB <- rmvnorm(mean=c(0,0),sig=S,n=1000) #Our gaussian variables U <- pnorm(AB) #Now U is uniform - check using hist(U[,1]) or hist(U[,2]) x <- qgamma(U[,1],2) #x is gamma distributed y <- qbeta(U[,2],1,2) #y is beta distributed plot(x,y) #They correlate!
That sort of stuff is tremendously useful when you want to have a statistical model for joint outcomes (for example when you want to describe how the dependency between wealth and cigar consumption changes depends on the country being US or Cuba).
Another interesting aspect of copulas, more theoretical, is that this also gives you a way of studying dependency independently of what the marginals look like…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.