Classification from scratch, linear discrimination 8/8

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Eighth post of our series on classification from scratch. The latest one was on the SVM, and today, I want to get back on very old stuff, with here also a linear separation of the space, using Fisher’s linear discriminent analysis.

Bayes (naive) classifier

Consider the follwing naive classification rulem(x)=argminy{P[Y=y|X=x]}orm(x)=argminy{P[X=x|Y=y]P[X=x]}(where P[X=x] is the density in the continuous case).

In the case where y takes two values, that will be standard {0,1} here, one can rewrite the later asm(x)={1 if E(Y|X=x)>120 otherwiseand the setDS={x,E(Y|X=x)=12}is called the decision boundary.

Assume thatX|Y=0N(μ0,Σ)andX|Y=1N(μ1,Σ)then explicit expressions can be derived.\(m^\star(\mathbf{x})={1 if r21<r20+2logP(Y=1)P(Y=0)+log|Σ0||Σ1|0 otherwise

[/latex]where [latex]r_y^2[/latex] is the Manalahobis distance, [latex display="true"]r_y^2 = [\mathbf{X}-\mathbf{\mu}_y]^{\text{{T}}}\mathbf{\Sigma}_y^{-1}[\mathbf{X}-\mathbf{\mu}_y][/latex]

Let [latex]\delta_y\)be defined asδy(x)=12log|Σy|12[xμy]{T}Σ1y[xμy]+logP(Y=y)the decision boundary of this classifier is {x such that δ0(x)=δ1(x)}which is quadratic in x. This is the quadratic discriminant analysis. This can be visualized bellow.

The decision boundary is here

But that can’t be the linear discriminant analysis, right? I mean, the frontier is not linear… Actually, in Fisher’s seminal paper, it was assumed that Σ0=Σ1.

In that case, actually, δy(x)=xTΣ1μy12μTyΣ1μy+logP(Y=y) and the decision frontier is now linear in x. This is the linear discriminant analysis. This can be visualized bellow

Here the two samples have the same variance matrix and the frontier is

Link with the logistic regression

Assume as previously thatX|Y=0N(μ0,Σ)andX|Y=1N(μ1,Σ)thenlogP(Y=1|X=x)P(Y=0|X=x)is equal to x{T}Σ1[μy]12[μ1μ0]{T}Σ1[μ1μ0]+logP(Y=1)P(Y=0)which is linear in xlogP(Y=1|X=x)P(Y=0|X=x)=x{T}βHence, when each groups have Gaussian distributions with identical variance matrix, then LDA and the logistic regression lead to the same classification rule.

Observe furthermore that the slope is proportional to Σ1[μ1μ0], as stated in Fisher’s article. But to obtain such a relationship, he observe that the ratio of between and within variances (in the two groups) wasvariance betweenvariance within=[ωμ1ωμ0]2ωTΣ1ω+ωTΣ0ωwhich is maximal when ω is proportional to Σ1[μ1μ0], when Σ0=Σ1.

Homebrew linear discriminant analysis

To compute vector ω

m0 = apply(myocarde[myocarde$PRONO=="0",1:7],2,mean)
m1 = apply(myocarde[myocarde$PRONO=="1",1:7],2,mean)
Sigma = var(myocarde[,1:7])
omega = solve(Sigma)%*%(m1-m0)
omega
                 [,1]
FRCAR -0.012909708542
INCAR  1.088582058796
INSYS -0.019390084344
PRDIA -0.025817110020
PAPUL  0.020441287970
PVENT -0.038298291091
REPUL -0.001371677757

For the constant – in the equation ωTx+b=0 – if we have equiprobable probabilities, use

b = (t(m1)%*%solve(Sigma)%*%m1-t(m0)%*%solve(Sigma)%*%m0)/2

Application (on the small dataset)

In order to visualize what’s going on, consider the small dataset, with only two covariates,

x = c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
y = c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
z = c(1,1,1,1,1,0,0,1,0,0)
df = data.frame(x1=x,x2=y,y=as.factor(z))
m0 = apply(df[df$y=="0",1:2],2,mean)
m1 = apply(df[df$y=="1",1:2],2,mean)
Sigma = var(df[,1:2])
omega = solve(Sigma)%*%(m1-m0)
omega
         [,1]
x1 -2.640613174
x2  4.858705676


Using R regular function, we get

library(MASS)
fit_lda = lda(y ~x1+x2 , data=df)
fit_lda

Coefficients of linear discriminants:
            LD1
x1 -2.588389554
x2  4.762614663

which is the same coefficient as the one we got with our own code. For the constant, use

b = (t(m1)%*%solve(Sigma)%*%m1-t(m0)%*%solve(Sigma)%*%m0)/2

If we plot it, we get the red straight line

plot(df$x1,df$x2,pch=c(1,19)[1+(df$y=="1")])
abline(a=b/omega[2],b=-omega[1]/omega[2],col="red")


As we can see (with the blue points), our red line intersects the middle of the segment of the two barycenters

points(m0["x1"],m0["x2"],pch=4)
points(m1["x1"],m1["x2"],pch=4)
segments(m0["x1"],m0["x2"],m1["x1"],m1["x2"],col="blue")
points(.5*m0["x1"]+.5*m1["x1"],.5*m0["x2"]+.5*m1["x2"],col="blue",pch=19)

Of course, we can also use R function

predlda = function(x,y) predict(fit_lda, data.frame(x1=x,x2=y))$class==1
vv=outer(vu,vu,predlda)
contour(vu,vu,vv,add=TRUE,lwd=2,levels = .5)


One can also consider the quadratic discriminent analysis since it might be difficult to argue that Σ0=Σ1

fit_qda = qda(y ~x1+x2 , data=df)

The separation curve is here

plot(df$x1,df$x2,pch=19,
col=c("blue","red")[1+(df$y=="1")])
predqda=function(x,y) predict(fit_qda, data.frame(x1=x,x2=y))$class==1
vv=outer(vu,vu,predlda)
contour(vu,vu,vv,add=TRUE,lwd=2,levels = .5)

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)