Site icon R-bloggers

Seal of Approval: dtplyr

[This article was first published on Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="dtplyr" class="level2">

dtplyr

dtplyr hex sticker

Author(s): Hadley Wickham, Maximilian Girlich, Mark Fairbanks, Ryan Dickerson, Posit Software PBC

Maintainer: Hadley Wickham (hadley@posit.co)

Provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.

< section id="relationship-with-data.table" class="level2">

Relationship with data.table

dtplyr is a bridge for users who are more comfortable with the dplyr syntax, but who want to take advantage of the speed and efficiency benefits of data.table. This package exactly duplicates the core functions of dplyr, but replaces the back-end source code (originally in Base R) with data.table operations.

< section id="overview" class="level2">

Overview

Excerpted from the dtplyr vignette

To use dtplyr, you must at least load dtplyr and dplyr. You may also want to load data.table so you can access the other goodies that it provides:

library(data.table)
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

Then use lazy_dt() to create a “lazy” data.table object that tracks the operations performed on it.

mtcars2 <- lazy_dt(mtcars)

You can preview the transformation (including the generated data.table code) by printing the result:

mtcars2 %>% 
  filter(wt < 5) %>% 
  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
  group_by(cyl) %>% 
  summarise(l100k = mean(l100k))
Source: local data table [3 x 2]
Call:   `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)), 
    keyby = .(cyl)]

    cyl l100k
  <dbl> <dbl>
1     4  9.05
2     6 12.0 
3     8 14.9 

# Use as.data.table()/as.data.frame()/as_tibble() to access results

But generally you should reserve this only for debugging, and use as.data.table(), as.data.frame(), or as_tibble() to indicate that you’re done with the transformation and want to access the results:

mtcars2 %>% 
  filter(wt < 5) %>% 
  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
  group_by(cyl) %>% 
  summarise(l100k = mean(l100k)) %>% 
  as_tibble()
# A tibble: 3 × 2
    cyl l100k
  <dbl> <dbl>
1     4  9.05
2     6 12.0 
3     8 14.9 

No matching items
To leave a comment for the author, please follow the link and comment on their blog: Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version