Site icon R-bloggers

Minimal, Explicit, Python Style Package Loading for R

[This article was first published on R – TRinker's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

About a year and a half back I was working in Python a bit and became accustomed to the explicit importing of modules (akin to R packages) and functions. Python imports packages like this:

import tidyr import dplyr as dp from plyr import l_ply, rbind.fill

If these R packages were in Python, the first line imports tidyr, the second imports dplyr as an object aliased as dp. These 2 lines are how Python imports packages. The user still needs to explicitly prefix objects from these modules when calling them. For example one might call: tidyr::separate or dplyr::select. The third line is more akin to how R loads packages and has access to all exported functions. Still, in Python the user has to explicitly name all functions that are imported. The way that Python imports packages and functions is explicit and may seem annoying to an R user but it avoids NAMESPACE clashes.

Back then I made a proof of concept package called pysty (Python style) to make R behave a bit more like Python for importing packages and functions. I had forgotten about this package…Fast forward to this week.

Within the last 2 days I have been bit by a bug with MASS/dplyr select and a team-mate by plyr/dplyr summarise clashes. I tend to prefix a lot of my functions with their package name now in my code to avoid such headaches. This habit reminded me of the pysty. I thought others may find it interesting.

The aliasing makes it convenient to explicitly reference a package without typing out the entire package name when you call it. pysty uses the colon operator to accomplish this explicit calling of functions. Most functions can be imported and added to the global name space in one fell swoop using the from call. All of these Python style calls are possible via the add_imports.

Installing pysty

Let’s start by installing and loading dependencies.

if (!require("pacman")) install.packages("pacman")
pacman::p_load_current_gh('trinker/pysty')

library(pysty)

Calling Dependencies

library(pysty)

add_imports('

import dplyr as dp
import MASS as m
import ggplot2 as gg
import tidyr
import plyr as p
from plyr import l_ply, rbind.fill

')

assign('%>%', dp::`%>%`) ## arrow assignment wasn't rendering in blog

The user can now access the functions in the first 5 packages, optionally as an alias if one was supplied. select can be called explicitly without specifying the entire package name (using the alias). The object select doesn’t exist in the global environment.

dp::select

## function (.data, ...) 
## {
##     UseMethod("select")
## }


m::select

## function (obj) 
## UseMethod("select")
## 
## 

select
## Error: object 'select' not found

dp::summarize

## function (.data, ...) 
## {
##     UseMethod("summarise")
## }
## 

p::summarize

## {
##     stopifnot(is.data.frame(.data) || is.list(.data) || is.environment(.data))
##     cols <- as.list(substitute(list(...))[-1])
##     if (is.null(names(cols))) {
##         missing_names <- rep(TRUE, length(cols))
##     }
##     else {
##         missing_names <- names(cols) == ""
##     }
##     if (any(missing_names)) {
##         names <- unname(unlist(lapply(match.call(expand.dots = FALSE)$..., 
##             deparse)))
##         names(cols)[missing_names] <- names[missing_names]
##     }
##     .data <- as.list(.data)
##     for (col in names(cols)) {
##         .data[[col]] <- eval(cols[[col]], .data, parent.frame())
##     }
##     quickdf(.data[names(cols)])
## }
## 

Use Cases

The following snippets demonstrate the use of such a package. Notice how the same function can be used within the same chain even if it comes from another package. Because you have to be explicit in using dependencies, the likeliness of a NAMESPACE conflict is slim.

longley %>%
    dp::select(-Employed) %>%
    {m::lm.ridge(GNP.deflator ~ ., ., lambda = seq(0,0.1,0.0001))} %>%
    m::select()

## modified HKB estimator is 0.004974797 
## modified L-W estimator is 0.03567913 
## smallest value of GCV  at 0.003 


p::baseball %>%
    dp::group_by(team) %>%
    dp::summarise(
        min_year = min(year),
        max_year = min(year)
    ) %>%
    p::summarise(
        duration = max(max_year) - min(min_year),
        nteams = length(unique(team))
    )

##   duration nteams
## 1      134    132

I doubt I’d use this myself in my workflow, but the idea was interesting to me and I wanted to share.

To leave a comment for the author, please follow the link and comment on their blog: R – TRinker's R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.