Populating data frame cells with more than one value
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data frames are lists
Most R users will know that data frames are lists. You can easily verify that a data frame is a list by typing
d <- data.frame(id=1:2, name=c("Jon", "Mark")) d id name 1 1 Jon 2 2 Mark is.list(d) [1] TRUE
However, data frames are lists with some special properties. For example, all entries in the list must have the same length (here 2), etc. You can find a nice description of the differences between lists and data frames here. To access the first column of d, we find that it contains a vector (and a factor in case of column name). Note, that [[ ]] is an operator to select a list element. As data frames are lists, they will work here as well.
is.vector(d[[1]]) [1] TRUE
Data frame columns can contain lists
A long time, I was unaware of the fact, that data frames may also contain lists as columns instead of vectors. For example, let’s assume Jon’s children are Mary and James, and Mark’s children are called Greta and Sally. Their names are stored in a list with two elements. We can add them to the data frame like this:
d$children <- list(c("Mary", "James"), c("Greta", "Sally")) d id name children 1 1 Jon Mary, James 2 2 Mark Greta, Sally
A single data frame entry in column children now contains more than one value. Given that the column is a list, not a vector, we cannot go as usual when modifying an entry of the column. For example, to change Jon’s children, we cannot do
> d[1 , "children"] <- c("Mary", "James", "Thomas") Error in `[<-.data.frame`(`*tmp*`, 1, "children", value = c("Mary", "James", : replacement has 3 rows, data has 1
Taking into account the list structure of the column, we can type the following to change the values in a single cell.
d[1 , "children"][[1]] <- list(c("Mary", "James", "Thomas")) # or also d$children[1] <- list(c("Mary", "James", "Thomas")) d id name children 1 1 Jon Mary, James, Thomas 2 2 Mark Greta, Sally
You can also create a data frame having a list as a column using the data.frame function, but with a little tweak. The list column has to be wrapped inside the function I. This will protect it from several conversions taking place in data.frame (see ?I documentation).
d <- data.frame(id = 1:2, name = c("Jon", "Mark"), children = I(list(c("Mary", "James"), c("Greta", "Sally"))) )
This is an interesting feature, which gives me a deeper understanding of what a data frame is. But when exactly would I want to use it? I have not encountered the need to use it very often yet (though of course there may be plenty of situations where it makes sense). But today I had a case where this feature seemed particularly useful.
Converting lists and data frames to JSON
I had two separate types of information. One stored in a data frame and the other one in a list Referring to the example above, I had
d <- data.frame(id=1:2, name=c("Jon", "Mark")) d id name 1 1 Jon 2 2 Mark
and
ch <- list(c("Mary", "James"), c("Greta", "Sally")) ch [[1]] [1] "Mary" "James" [[2]] [1] "Greta" "Sally"
I needed to return an array of JSON objects which look like this.
[ { "id": 1, "name": "Jon", "children": ["Mary", "James"] }, { "id": 2, "name": "Mark", "children": ["Greta", "Sally"] } ]
Working with the superb jsonlite package to convert R to JSON, I could do the following to get the result above.
library(jsonlite) l <- split(d, seq(nrow(d))) # convert data frame rows to list l <- unname(l) # remove list names for (i in seq_along(l)) # add element from ch to list l[[i]] <- c(l[[i]], children=ch[i]) toJSON(l, pretty=T, auto_unbox = T) # convert to JSON
The results are correct, but getting there involved quite a number of tedious steps. These can be avoided by directly placing the list into a column of the data frame. Then jsonlite::toJSON takes care of the rest.
d$children <- list(c("Mary", "James"), c("Greta", "Sally")) toJSON(d, pretty=T, auto_unbox = T) [ { "id": 1, "name": "Jon", "children": ["Mary", "James"] }, { "id": 2, "name": "Mark", "children": ["Greta", "Sally"] } ]
Nice 🙂 What we do here, is basically creating the same nested list structure as above, only now it is disguised as a data frame. However, this approach is much more convenient.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.