Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In R I’m usually using a combination of base functions, functions from loaded packages, and functions I’ve defined in a script in my workflow or sourced from a helperfunctions.r
type file. If you type the name of any R function into the console, you can see where it comes from if it is from a package and if there is no package you can assume it is ‘local’, but it isn’t always obvious when scrolling through code. For these reasons, I think it’d be useful to delineate. Some best practices (much of which is already in common use) might be:
- use functions from base (or stats, utils, …) without any indication, for example,
with()
,lm()
- always use :: if you are calling a function from a package, for example
lme4::glm()
- always load all libraries required in a script somewhere near the top of the script with a comment detailing which functions are used
- if it is a tiny helper function or a locally sourced function then indicate some way… but how?
I think the first three points are obvious, but what about the third point. I’m thinking of these ‘local’ functions are sort of similar to private functions or methods in an OO context. And actually, a lot of ink has been spilled on SO/SE type forums about the topic of what naming conventions or indications you should use for private functions.
In C# I guess the convention (?) is to use leading underscores for private fields, so a lot of people suggest that for private functions too. R isn’t really supposed to have leading underscores though. Some folks use thisCase versus ThatCase which I think is just hideous. The tidyverse seems to like underscore separated words instead of this or that case ::shruggyemoji::.
I don’t really like any of the above options? In particular, I don’t think capitalization is clear / obvious enough. One exception is that I like constants to be capitalized and underscore separated. Prepending local_ or private_ to everything doesn’t seem specific or clear enough either, but it could work?
One idea to mark these functions a little more dramatically might be to use environments for ‘lexical scoping’ (is that the right term?). So, something like this:
localutils <- new.env(parent = emptyenv()) localutils$f1 <- function(x) 1
Is this too clunky? Should the localutils environment be capitalized because it is kind of like a constant? You can go one step more and make a saved RDS with an environment containing the functions you want to have available and load that RDS file in near where you make your library
calls:
# load libraries and functions library(mylibrary) LOCALUTILS <- readRDS("01_helperfunctions/localutils.rds") # ... x <- 1 y <- LOCALUTILS$f1(x) z <- mylibrary::f2(y)
This is OK. One downside is this kind of hides the fact that it is an environment, but if you have a medium amount of functions to define, or want to reuse them across multiple scripts in a workflow, this would be more convenient. Also, maybe sourcing an R file with the local functions is better if you want people to be able to more easily browse the functions in a text editor, but the syntax of readRDS is kind of nice because you can see the assignment to the name ‘LOCALUTILS’. source()
actually returns the value from whatever you source, so you could do something pretty similar to readRDS
. Imagine a file localutils.r
:
LOCALUTILS <- new.env(parent = emptyenv()) LOCALUTILS$f1 <- function(x) 1 LOCALUTILS
and then just as above you can load that file when you load libraries:
# load libraries and functions library(mylibrary) LOCALUTILS <- source("localutils.r")$value
This works, but note that if your environment has a different name in localutils.r,
you’ll get end up with two copies when you source, one in LOCALUTILS
and one in whatever it was called in localutils.r
.
An obvious alternative solution is just to always put all your helper functions in a package, but that’s a pain for one off code, especially if you want to hand code to someone without worrying about a lot of dependencies. Also sometimes it is just one or two helper functions you want to split out and that’s usually not worth making a package for.
Finally, should we add comments with sort of function declarations for these local functions somewhere near where they are sourced to help the reader since there isn’t going to be a help file for them? Something like this maybe:
# load libraries and functions library(mylibrary) # for f1, f2 LOCALUTILS <- source("localutils.r")$value # LOCALUTILS$f3( n ) returns a vector of n frog names # LOCALUTILS$f4( frog_names ) returns a matrix of frog name similarity # ...
Do these things make code more readable and sharable and maintainable or is it just confusing?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.