How to Find Unmatched Records in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The post How to Find Unmatched Records in R appeared first on Data Science Tutorials
How to Find Unmatched Records in R?, To retrieve all rows in one data frame that do not have matching values in another data frame, use R’s anti_join() function from the dplyr package.
The basic syntax used by this function is as follows.
How to Remove Columns from a data frame in R – Data Science Tutorials
anti_join(df1, df2, by='col_name')
The usage of this syntax is demonstrated in the examples that follow.
Example 1: Use anti_join() with One Column
Suppose we have the two R data frames shown below:
Let’s build data frames
df1 <- data.frame(Q1 = c('a', 'b', 'c', 'd', 'e', 'f'), Q2 = c(152, 514, 114, 218, 322, 323)) df2 <- data.frame(Q1 = c('a', 'a', 'a', 'b', 'b', 'b'), Q3 = c(523, 324, 233, 134, 237, 141))
To return all rows in the first data frame that don’t have a matching Q1 in the second data frame, we can use the anti_join() function.
Bind together two data frames by their rows or columns in R (datasciencetut.com)
library(dplyr)
use the ‘Q1’ column to perform anti join
anti_join(df1, df2, by='Q1') Q1 Q2 1 c 114 2 d 218 3 e 322 4 f 323
We can see that there are exactly 4 Q1’s from the first data frame that does not have a matching Q1 name in the second data frame.
Example 2: Use anti_join() with Multiple Columns
Suppose we have the two R data frames shown below.
How to Join Data Frames for different column names in R (datasciencetut.com)
Let’s create a data frames
df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'F', 'G', 'F', 'C'), points=c(152, 114, 219, 254, 356, 441)) df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'C', 'G', 'F', 'F'), points=c(142, 214, 319, 133, 517, 422))
All rows in the first data frame that lack a matching team and position in the second data frame can be returned using the anti_join() function:
library(dplyr)
utilizing the columns for “team” and “position,” perform anti _join.
How to Count Distinct Values in R – Data Science Tutorials
anti_join(df1, df2, by=c('team', 'position')) team position points 1 A F 219 2 B C 441
We can see that there are exactly two records from the first data frame that do not have a matching team name and position in the second data frame.
The post How to Find Unmatched Records in R appeared first on Data Science Tutorials
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.