To attach() or not attach(): that is the question

[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R objects that reside in other R objects can require a lot of typing to access. For example, to refer to a variable x in a dataframe df, one could type df$x. This is no problem when the dataframe and variable names are short, but can become burdensome when longer names or repeated references are required, or objects in complicated structures must be accessed.

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes. As an example:
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768

The search() function can be used to list attached objects and packages. Let’s see what is there, then detach() the dataset to clean up after ourselves.
search()
> search()
 [1] ".GlobalEnv"        "ds"                "tools:RGUI"        "package:stats"
 [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 [9] "package:methods"   "Autoloads"         "package:base"    
detach(ds)

As noted in section B.4.5, users are cautioned that if there is already a variable
called cesd in the local workspace, issuing attach(ds), may not mean that cesd references ds$cesd. Name conflicts of this type are a common problem with attach() and care should be taken to avoid them.

The help page for attach() notes that attach can lead to confusion. The Google R Style Manual provides clear advice on this point, providing the following advice about attach():
The possibilities for creating errors when using attach are numerous. Avoid it.


After being burned by this one too many times, we concur.

So what options exist for those who decide to go cold turkey?

  1. Reference variables directly (e.g. lm(ds$x ~ ds$y))

  2. Specify the dataframe for commands which support this (e.g. lm(y ~ x, data=ds))

  3. Use the with() function, which returns the value of whatever expression is evaluated (e.g. with(ds,lm(y ~x)))

  4. (Also note the within() function, which is similar to with(), but returns a modified object.)


Some examples may be helpful.
> # fit a linear model
> lm1 = lm(cesd ~ pcs, data=ds)

> mean(ds$cesd[ds$female==1])  # these next three are equivalent
[1] 36.88785
> with(ds, mean(cesd[female==1]))
[1] 36.88785
> with(subset(ds, female==1), mean(cesd))
[1] 36.88785

In short, there’s never an actual need to use attach(), using it can lead to confusion or errors, and alternatives exists that avoid the problems. We recommend against it.

In SAS, all procedures use the most recent data set or must reference a data set explicitly. Very roughly speaking, using attach() in R is like relying on the implicit use of the most recent data set. Our recommendation against attach() thus mirrors our use of the data= option throughout our books.

To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)