Set Operations in R and Python. Useful!

Set operations are super useful when data cleaning or testing scripts. They are a must have in any analyst’s (data scientist’s/statistician’s/data wizard’s) toolbox. Here is a quick rundown in both R and python.

Say we have two vectors x and y…

# vector x
x = c(1,2,3,4,5,6)

# vector y
y = c(4,5,6,7,8,9)

What if we ‘combined’ x and y ignoring any duplicate elements? (x \cup y)

# x UNION y
union(x, y)

[1] 1 2 3 4 5 6 7 8 9

What are the common elements in x and y? (x \cap y)

# x INTERSECTION y
intersect(x, y)

[1] 4 5 6

What elements feature in x but not in y?

# x members not in y
setdiff(x,y)

[1] 1 2 3

What elements feature in y but not in x?

# y members not in x
setdiff(y,x)

[1] 7 8 9

How might we visualise all this?

# required package
library(VennDiagram)
# plot venn
draw.pairwise.venn(area1 = 6, area2 = 6, cross.area = 3,
category = c('x', 'y'),
fill = c('darkred', 'darkgreen'),
alpha = rep(0.3, 2),
scaled = FALSE)
view raw venn.R hosted with ❤ by GitHub

Venn Diagram
Venn Diagram

What about python? In standard python there exists a module called ‘sets’ that allows for the creation of a ‘Set’ object from a python list. The Set object has methods that provide the same functionality as the R functions above.

# creating set x
x = set([1,2,3,4,5,6])
# creating set y
y = set([4,5,6,7,8,9])
# x UNION y
x.union(y)
{1, 2, 3, 4, 5, 6, 7, 8, 9}
# x INTERSECTION y
x.intersection(y)
{4, 5, 6}
# x members not in y
x.difference(y)
{1, 2, 3}
# y members not in x
y.difference(x)
{7, 8, 9}

References:
http://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html
https://docs.python.org/2/library/sets.html


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)