Hack: The “count(case when … else … end)” in dplyr
[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When I run quires in SQL (or even HiveQL, Spark SQL and so on), it is quite common to use the syntax of count(case when.. else ... end)
. Today, I will provide you an example of how you run this type of commands in dplyr
.
Let’s start:
library(sqldf) library(dplyr) df<-data.frame(id = 1:10, gender = c("m","m","m","f","f","f","m","f","f","f"), amt= c(5,20,30,10,20,50,5,20,10,30)) df
Let’s get the count
and the sum
per gender in different columns in SQL.
sqldf("select count(case when gender='m' then id else null end) as male_cnt, count(case when gender='f' then id else null end) as female_cnt, sum(case when gender='m' then amt else 0 end) as male_amt, sum(case when gender='f' then amt else 0 end) as female_amt from df")
Output:
male_cnt female_cnt male_amt female_amt 1 4 6 60 140
Let’s get the same output in dplyr. We will need to subset the data frame based on one column.
df%>%summarise(male_cnt=length(id[gender=="m"]), female_cnt=length(id[gender=="f"]), male_amt=sum(amt[gender=="m"]), female_amt=sum(amt[gender=="f"]) )
Output:
male_cnt female_cnt male_amt female_amt 1 4 6 60 140
To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.