How to Calculate Jaccard Similarity in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Jaccard Similarity in R, The Jaccard similarity index compares two sets of data to see how similar they are. It might be anywhere between 0 and 1. The greater the number, the closer the two sets of data are.
The Jaccard Index is a statistical measure that is frequently used to compare the similarity of binary variable sets. It is the length of the union divided by the size of the intersection between the sets.
The following formula is used to calculate the Jaccard similarity index:
Jaccard Similarity = (number of observations in both sets) / (number in either set)
Or, written in notation form:
J(A, B) = |A∩B| / |A∪B|
This article will show you how to use R to calculate Jaccard Similarity between two sets of data.
Jaccard similarity in R
Assume that we have the following two sets of data.
a <- c(1,5,8,10,22,14,15,16,2,7) b <- c(10,12,13,2,7,9,2,7,23,15)
To determine the Jaccard Similarity between the two sets, we can use the following function.
Repeated Measures of ANOVA in R Complete Tutorial »
Define Jaccard Similarity function
jaccard <- function(a, b) { intersection = length(intersect(a, b)) union = length(a) + length(b) - intersection return (intersection/union) }
Let’s find the Jaccard Similarity between the two sets
jaccard(a, b)
[1] 0.25
The Jaccard Similarity between the two lists is 0.25. As mentioned above greater the number closer to the data sets.
Deep Neural Network in R » Keras & Tensor Flow
If the two sets don’t exchange any values, the function will return 0. If the two sets are identical, the function will return 1.
Let see two examples here,
a <- c(1,5,8,10) b <- c(11,6,12,13) jaccard(a, b)
[1] 0
a <- c(1,5,8,10) b <- c(1,5,8,10) jaccard(a, b)
[1] 1
The function is also applicable to sets containing strings.
Linear optimization using R » Optimal Solution »
a <- c('potato', 'tomotto', 'chips', 'baloon') b <- c('car', 'chips', 'bird', 'salt') jaccard(a, b)
[1] 0.1428571
You can also use this method to discover the Jaccard distance between two sets, which is calculated as 1 – Jaccard Similarity and represents the dissimilarity between two sets.
a <- c(1,5,8,10,22,14,15,16,2,7) b <- c(10,12,13,2,7,9,2,7,23,15) 1-jaccard(a, b)
[1] 0.75
If you’re looking for a way to calculate the Jaccard similarity matrix, the vegan package is a good place to start. Many other similarities/dissimilarity measures can be calculated with the vegdist() function.
LSTM Network in R » Recurrent Neural network »
install.packages("vegan") library(vegan) a <- c(1,5,8,10,22,14,15,16,2,7) b <- c(10,12,13,2,7,9,2,7,23,15) df<-data.frame(a,b) vegdist(df, method = "jaccard") 1 2 3 4 5 6 7 8 2 0.3529412 3 0.4761905 0.1904762 4 0.8500000 0.6818182 0.5652174 5 0.7500000 0.6470588 0.5714286 0.5862069 6 0.5833333 0.4615385 0.3703704 0.4782609 0.3225806 7 0.8800000 0.7407407 0.6428571 0.2941176 0.4137931 0.3333333 8 0.6923077 0.5714286 0.4827586 0.4782609 0.2068966 0.1600000 0.2608696 9 0.5600000 0.5000000 0.5161290 0.8787879 0.8000000 0.7027027 0.8947368 0.7692308 10 0.5000000 0.2272727 0.1304348 0.6400000 0.6216216 0.4482759 0.7000000 0.5483871 9 2 3 4 5 6 7 8 9 10 0.4333333
Significance of Spearman’s Rank Correlation
The post How to Calculate Jaccard Similarity in R appeared first on finnstats.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.