poorman: First Release of a base R dplyr Clone
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The first official release of poorman
(v 0.1.9) is now on CRAN! You can now install poorman
directly from CRAN with the following code:
install.packages("poorman")
In this blog post I want to address some common questions that I have received since I started writing the package.
What is poorman
?
poorman
is a package that unapologetically attempts to recreate the dplyr
API in a dependency free way using only base
R. poorman
is still under development and doesn’t have all of dplyr
’s functionality but what I would consider the “core” functionality is included. The idea behind poorman
is that a user should be able to take their dplyr
based script and run it using poorman
without any hiccups.
So what does poorman
include?
In this first official release, poorman
includes copies of the key dplyr
functions.
select(), rename(), pull(), relocate(), mutate(), transmute(), arrange() filter(), slice() summarise() / summarize() group_by(), ungroup()
poorman
also includes the join functionality.
inner_join(), left_join(), right_join(), full_join() anti_join(), semi_join()
Finally poorman
also includes its own version of the pipe so you do not need to load or install magrittr
.
%>%
More functionality is being developed and will be added in time.
Why develop poorman
?
This is probably the most common question; why bother developing poorman
when dplyr
already exists. Well there are actually several reasons why I decided to develop it. The most important reason to me though is quite simply because I can. poorman
started out as a personal challenge and a bit of fun. Also as a freelance R developer, it is good to build up my portfolio of open source code that I can show to potential clients.
Another reason for developing poorman
is I wanted to challenge a common misconception that base
R is not as powerful, or as good, or as useful as dplyr
. Too often I see and hear comments belittling base
R and as a user of the language for over 10 years now – well before the inception of dplyr
– I find this idea very worrying. poorman
’s package start up message is quite poignant in this regard.
I’d seen my father. He was a poor man, and I watched him do astonishing things. – Sidney Poitier
Finally, I have a natural joy of teaching. Writing poorman
gives me a platform to hopefully show useRs two key aspects of R programming in base
; common data manipulation tasks; and non-standard evaulation.
But why not just use dplyr
?
Let’s be honest, the tidyverse
is a fantastic set of packages which have transformed the face of data analysis in R, and dplyr
is arguably one of the most important packages within the tidyverse
. The API is in my opinion very easy to learn and use.
Being a part of the tidyverse
, however, means that dplyr
comes with a large number of dependencies that users must also install which is often seen as a disadvantage to using the tidyverse
. Disadvantages of dependencies have been written about before and so I won’t go into detail here. However what I will say is that the user may not have a need for additional parts of the tidyverse
and so may not wish to have to install multiple packages to use one or two functions.
Some of these dependencies are very useful of course, allowing expansion into other areas such as accessing Spark instances and databases using the same API the user already knows. This is great and if you are using these additional tools then I absolutely recommend that you choose dplyr
over poorman
. However if you don’t need the extra dependencies and functionality that comes with the wider tidyverse
then maybe consider giving the lightweight poorman
a go.
Finally a point on installation times, poorman
takes roughly six seconds to install. If you’ve ever had to install or upgrade dplyr
or the tidyverse
, you’ll recognise that this is very fast.
Why the name poorman
?
As I have already mentioned, I have seen comments in the past pertaining to R’s worthlessness without the tidyverse
and so the name poorman
is a subtle play on the the idea that you must be a “poor man” if you use base
. The irony of course is that I have managed to recreate – quite easily – the key parts of the dplyr
API using only base
R.
Why not use data.table
for the backend?
Because I wanted to build something that was completely dependency free and adding data.table
as an Import
adds a dependency.
But doesn’t poorman
have dependencies?
To answer this, we need to define what we mean by “dependency free”. poorman
does have some dependencies but they are for development purposes only and are therefore listed in the Suggests
part of the DESCRIPTION
file. Thus when a user installs the package, these dependencies are only ever installed if they are explicitly requested. However, poorman
doesn’t have any dependencies that users of the package need to install in order to use its functionality. I use these dependency packages to help me develop more easily. Therefore poorman
isn’t a truly “dependency free” like data.table
is, but it is dependency free for its intended users.
Conclusion
So if you find yourself needing a dependency free data manipulation package that follows the dplyr
API with short installation times then give poorman
a try. Equally if you find any issues, please submit an issue to GitHub
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.