Site icon R-bloggers

poorman: First Release of a base R dplyr Clone

[This article was first published on Random R Ramblings, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

The first official release of poorman (v 0.1.9) is now on CRAN! You can now install poorman directly from CRAN with the following code:

install.packages("poorman")

In this blog post I want to address some common questions that I have received since I started writing the package.

What is poorman?

poorman is a package that unapologetically attempts to recreate the dplyr API in a dependency free way using only base R. poorman is still under development and doesn’t have all of dplyr’s functionality but what I would consider the “core” functionality is included. The idea behind poorman is that a user should be able to take their dplyr based script and run it using poorman without any hiccups.

So what does poorman include?

In this first official release, poorman includes copies of the key dplyr functions.

select(), rename(), pull(), relocate(), mutate(), transmute(), arrange()
filter(), slice()
summarise() / summarize()
group_by(), ungroup()

poorman also includes the join functionality.

inner_join(), left_join(), right_join(), full_join()
anti_join(), semi_join()

Finally poorman also includes its own version of the pipe so you do not need to load or install magrittr.

%>%

More functionality is being developed and will be added in time.

Why develop poorman?

This is probably the most common question; why bother developing poorman when dplyr already exists. Well there are actually several reasons why I decided to develop it. The most important reason to me though is quite simply because I can. poorman started out as a personal challenge and a bit of fun. Also as a freelance R developer, it is good to build up my portfolio of open source code that I can show to potential clients.

Another reason for developing poorman is I wanted to challenge a common misconception that base R is not as powerful, or as good, or as useful as dplyr. Too often I see and hear comments belittling base R and as a user of the language for over 10 years now – well before the inception of dplyr – I find this idea very worrying. poorman’s package start up message is quite poignant in this regard.

I’d seen my father. He was a poor man, and I watched him do astonishing things. – Sidney Poitier

Finally, I have a natural joy of teaching. Writing poorman gives me a platform to hopefully show useRs two key aspects of R programming in base; common data manipulation tasks; and non-standard evaulation.

But why not just use dplyr?

Let’s be honest, the tidyverse is a fantastic set of packages which have transformed the face of data analysis in R, and dplyr is arguably one of the most important packages within the tidyverse. The API is in my opinion very easy to learn and use.

Being a part of the tidyverse, however, means that dplyr comes with a large number of dependencies that users must also install which is often seen as a disadvantage to using the tidyverse. Disadvantages of dependencies have been written about before and so I won’t go into detail here. However what I will say is that the user may not have a need for additional parts of the tidyverse and so may not wish to have to install multiple packages to use one or two functions.

Some of these dependencies are very useful of course, allowing expansion into other areas such as accessing Spark instances and databases using the same API the user already knows. This is great and if you are using these additional tools then I absolutely recommend that you choose dplyr over poorman. However if you don’t need the extra dependencies and functionality that comes with the wider tidyverse then maybe consider giving the lightweight poorman a go.

Finally a point on installation times, poorman takes roughly six seconds to install. If you’ve ever had to install or upgrade dplyr or the tidyverse, you’ll recognise that this is very fast.

Why the name poorman?

As I have already mentioned, I have seen comments in the past pertaining to R’s worthlessness without the tidyverse and so the name poorman is a subtle play on the the idea that you must be a “poor man” if you use base. The irony of course is that I have managed to recreate – quite easily – the key parts of the dplyr API using only base R.

Why not use data.table for the backend?

Because I wanted to build something that was completely dependency free and adding data.table as an Import adds a dependency.

But doesn’t poorman have dependencies?

To answer this, we need to define what we mean by “dependency free”. poorman does have some dependencies but they are for development purposes only and are therefore listed in the Suggests part of the DESCRIPTION file. Thus when a user installs the package, these dependencies are only ever installed if they are explicitly requested. However, poorman doesn’t have any dependencies that users of the package need to install in order to use its functionality. I use these dependency packages to help me develop more easily. Therefore poorman isn’t a truly “dependency free” like data.table is, but it is dependency free for its intended users.

Conclusion

So if you find yourself needing a dependency free data manipulation package that follows the dplyr API with short installation times then give poorman a try. Equally if you find any issues, please submit an issue to GitHub.

To leave a comment for the author, please follow the link and comment on their blog: Random R Ramblings.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.