Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The efficient way to get the job done is by applying linear programming (LP). That means representing the question “Is it possible to fit a hyper-plane between two sets of points?” with a number of inequalities (that make up a convex area). I’m going to give a quick walk through for the math to make the idea plausible – but this text is more describing an introductory example and not an introduction to LP itself. For solving the linear program I will use Rglpk which provides a high level interface to the GNU Linear Programming Kit (GLPK) – and of course has been co-crafted by the man himself – Kurt Hornik – who is also involved with kernlab and party – thank you, Prof. Hornik and keep up the good work!
The Gory Details
Let’s say we have two sets
And we want to know if there is a hyper-plane in
A
This is because a hyper-plane in
N <- 100 g <- expand.grid(x=0:N/N,y=0:N/N) # definition of the hyper plane h1 <- 3 h2 <- -4 beta <- -1.3 # points on either side g$col <- ifelse(h1 * g$x + h2 * g$y > beta, "cornflowerblue","darkolivegreen3") # roughly on the hyper plane g$col <- ifelse(abs(h1 * g$x + h2 * g$y - beta) < 2/N, "red", g$col) plot(g$x, g$y, col=g$col, pch=16, cex=.5, xlab="x", ylab="y", main="h(x,y) = 3 * x + (-4) * y + 1.3 = 0")
The conditions
(3)
(4)
Okay great – we’re almost there – now let’s get all the variables on the left hand side:
(5)
(6)
These hyper-plane conditions have to be true for all points. All points in
So now we formulate (5) and (6) in matrix notation because this is how LP solvers expect the program description to be fed to them – we get:
Example with Rglpk in two dimensions
library(Rglpk) N1 <- 3 N2 <- 3 # the points of sets A and B P <- matrix( runif(2*N1 + 2*N2,0,1), ncol=2,byrow=T ) # the matrix A defining the lhs of the conditions A <- cbind(P * c(rep(-1,N1),rep(1,N2)), c(rep(1,N1),rep(-1,N2))) # the objective function - no optimization necessary obj <- c(0,0,0) # the vector b defining the rhs of the conditions b <- rep(-1, N1+N2) # by default GLPK assums positive boundaries for the # variables. but we need the full set of real numbers. bounds <- list( lower = list(ind = c(1,2,3), val = c(-Inf,-Inf,-Inf)), upper = list(ind = c(1,2,3), val = c(Inf,Inf,Inf)) ) # solving the linear program s <- Rglpk_solve_LP(obj, A, rep("<=", N1+N2), b, bounds=bounds) plot(P,col=c(rep("red",N1),rep("blue",N2)), xlab="x", ylab="y", cex=1, pch=16, xlim=c(0,1), ylim=c(0,1)) # status 0 means that a solution was found if(s$status == 0) { h1 = s$solution[1] h2 = s$solution[2] beta = s$solution[3] # drawing the separating line if(h2 != 0) { abline(beta/h2,-h1/h2) } else { abline(v=-beta/h1) } } else { cat("Not linearly separable.") }
In case you are wondering how I managed to include all those pretty pretty math formulas in this post – I am using the QuickLaTeX WordPress plug-in and I must say I really like the result. In previous posts I used a LaTeX web editor and then included the rendered formulas as an image.
The post Testing for Linear Separability with Linear Programming in R appeared first on joy of data.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.