Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R
is changing the way it deals with converting strings to factors in functions like data.frame()
. There is a detailed post about the plan, but that post was created before version 4.0.0
so I’m not sure if anything has changed.
I’m running R 4.0.5
right now. I know I’m behind, but I’m in the middle of a project and I don’t want to update until I finish the project. Anyway, default.stringsAsFactors()
is now TRUE
. And this is nice. I can also see:
args(data.frame) # function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors())
That’s nice. but look at:
args(expand.grid) # function (..., KEEP.OUT.ATTRS = TRUE, stringsAsFactors = TRUE)
That’s less nice and generated a rather confusing bug for me recently. Also look at this:
letterframe1 <- data.frame(cbind(LETTERS, 1)) class(letterframe1[, 1]) # character letterframe2 <- data.frame(table(LETTERS)) class(letterframe2[, 1]) # factor letterframe3 <- data.frame(table(LETTERS), stringsAsFactors = FALSE) class(letterframe3[, 1]) # factor letterframe4 <- as.data.frame(table(LETTERS), stringsAsFactors = FALSE) class(letterframe4[, 1]) # character
And for the record:
letterframe5 <- as.data.frame(table(LETTERS)) class(letterframe5[, 1]) # factor
By the way, I ran all the above examples in R 3.6.1
and everything returned factor except for class(letterframe4
[, 1]).
This was unexpected. I’m sure there’s a sensible reason for all that, but I don’t know enough to guess exactly what it could be.
The character input must be getting converted to a factor somewhere inside table()
, but I’m not sure why the difference between as.data.frame()
and data.frame()
. If it is truly already a factor then both functions should return a factor regardless of stringsAsFactors
. Although to me that isn’t really an optimal solution either because it isn’t obvious from table()
output or from the documentation that this conversion should take place to my initial character input.
If I correctly understood the dev blog linked above, changing default.stringsAsFactors()
might be a transition phase to a new system that works in a different way, so maybe this new system will encompass some of these other scenarios when implemented.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.