Site icon R-bloggers

New cheat-sheet for the dplyrXdf package

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hadley Wickham's dplyr package is an amazing tool for restructuring, filtering, and aggregating data sets using its elegant grammar of data manipulation. By default, it works on in-memory data frames, which means you're limited to the amount of data you can fit into R's memory. Hadley also provided an extension mechanism to make dplyr work with external data sources, and so Hong Ooi created the dplyrXdf package to work with Xdf data files. With dplyrXdf you can manipulate data files of virtually unlimited size using R, and even use the pipe operator %>% from the magrittr package.

To use the dplyrXdf package, you will need to use Microsoft R Client (free download for Windows) or Microsoft R Server (on Windows, Linux, Hadoop or HDInsight with Spark). The Xdf files you create can then be used with the big-data functions of the included ScaleR package, enabling you to use R to perform statistical analysis of files hundreds of gigabytes in size

To help you get started with the dplyrXdf packaghe, Hong has created a new dplyrXdf cheat sheet (pdf). This handy and printable 2-page document explains how dplyrXdf:

  • Extends dplyr framework to large, on-disk data sets
  • Simplifies current interface to xdf functionality
  • Handles the task of file management for the user
  • Is transparent to other xdf-aware functions

It also includes some extended examples of working with big data with dplyrXdf and analyzing them with the ScaleR package. To download the cheat-sheet, click on the link below.

Microsoft Advanced Analytics: dplyrXdf cheat sheet

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.