Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week, I spend some time at the Workshop on Nonparametric Curve Smoothing conference at Concordia. Yesterday afternoon, Noël Veraverbeke show an interesting graph, to illustrate conditional copulas (and the derivation of conditional dependence measures, such as Kendall’s tau, or Spearman’s rho). A long time ago, in my PhD thesis (mainly on conditional copulas) I did try to derive conditional dependence measures (in a dedicated chapter). In my PhD, I was interested to describe the dependence of a pair
In the paper Noël mentioned, they want to describe the dependence of a pair
b1=read.table("sp.dyn.le00.fe.in_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2) b2=read.table("sp.dyn.le00.ma.in_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2) b3=read.table("ny.gdp.pcap.cd_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2) b1b=b1[,c(1,2,55)] b2b=b2[,c(1,2,55)] b3b=b3[,c(1,2,55)] names(b1b)[3]="LEF" names(b2b)[3]="LEM" names(b3b)[3]="GPD" b=merge(b1b,b2b) b=merge(b,b3b) plot(b$LEM,b$LEF,xlab="Life Expectancy (male vs. female)")
With this graph, we cannot visualize the link with the covariate,
b$cgpd=cut(b$GPD,quantile(b$GPD,seq(0,1,by=1/6),na.rm=TRUE)) levels(b$cgpd)=as.character(1:6) library(RColorBrewer) CL=brewer.pal(6, "RdBu") plot(b$LEM,b$LEF,xlab="Life Expectancy (male vs. female)",pch=19,col=CL[as.numeric(b$cgpd)])
Here, poor countries are in red, and rich countries in blue,
Clearly, life expectancy is connected to the wealth of the country,
plot(b$GPD,b$LEF,xlab="(Female) Life Expectancy vs. GPD (log scale)",pch=19,col=CL[as.numeric(b$cgpd)],log="x") plot(b$GPD,b$LEM,xlab="(Male) Life Expectancy vs. GPD (log scale)",pch=19,col=CL[as.numeric(b$cgpd)],log="x")
The idea here is to consider the conditional dependence structure, given the wealth. If we want something smooth (this is actually the goal of the workshop, but I’d like to make that quickly) consider some weighted version of Kendall’s tau, based on the idea mentioned in a post on http://stackoverflow.com/
The idea is to use concordance and discordance counts, with replications of the data, based on the weights
P = function(t) { r_ndx = row(t) c_ndx = col(t) sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])}, r = r_ndx, c = c_ndx))} Q = function(t) { r_ndx = row(t) c_ndx = col(t) sum(t * mapply( function(r, c){ sum(t[(r_ndx > r) & (c_ndx < c)]) }, r = r_ndx, c = c_ndx) ) } kendall_tau_c = function(t){ t = as.matrix(t) m = min(dim(t)) n = sum(t) ks_tauc = (m*2*(P(t)-Q(t)))/((n*n)*(m-1)) } I=is.na(b$GPD) bw=density(log(b$GPD[!I]))$bw kendall.weight=function(x){ df=data.frame(Y1=b$LEF, Y2=b$LEM, freq=trunc(dnorm(log(b$GPD)-log(x),sd=bw)*100)) df=df[!is.na(df$freq),] dfrep=data.frame( lapply(df, function(x){rep(x, df$freq)})) t=xtabs(~ Y1+Y2, dfrep) return(kendall_tau_c(t))}
Here, I use weights using some Gaussian kernel on the logarithm of the GPD per capita (my standard deviation for the Gaussian weight being equal to the bandwidth of the Gaussian kernel of the density of the log of the GPD per capita), then, we can compute various conditional Kendall’s tau,
T=exp(seq(6,11.5,length=50)) K=Vectorize(kendall.weight)(T)
and plot them,
plot(T,K,type="l",xlab="Conditional Kendall's tau vs. GPD (log scale)")
There is more “correlation” between lifetimes of men and women in poor countries than rich country (which is also what Noël observed). Now, we can also play with time, because we have those statistics for several years.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.