Site icon R-bloggers

Vectorized Block ifelse in R

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R.

From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure.

When porting code from one language to another you hope the expressive power and style of the languages are similar.

SAS has some strong and powerful operators. One such is what I am calling “the vectorized block if(){}else{}“. From SAS documentation:

The subsetting IF statement causes the DATA step to continue processing only those raw data records or those observations from a SAS data set that meet the condition of the expression that is specified in the IF statement.

That is a really wonderful operator!

R has some available related operators: base::ifelse(), dplyr::if_else(), and dplyr::mutate_if(). However, none of these has the full expressive power of the SAS operator, which can per data row:

To help achieve such expressive power in R Win-Vector is introducing seplyr::if_else_device(). When combined with seplyr::partition_mutate_se() you get a good high performance simulation of the SAS power in R. These are now available in the open source R package seplyr.

For more information please reach out to us here at Win-Vector or try help(if_else_device).

Also, we will publicize more documentation and examples shortly (especially showing big data scale use with Apache Spark via Sparklyr).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.