[This article was first published on StaTEAstics., and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Handling meta-data is not natural in R, or any traditional rectangular shaped type data storage system.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are several tricks and packages which attempt to solve this problem, with Hmisc using the atrribute feature and the IRange package having its own DataFrame class.
The Hmisc allows one to store data such as units, label and comments
library(Hmisc)
## Create a test data frame
test.df <- data.frame(x = ts(1:12, start = c(2000, 1), frequency = 12),
y = ts(1:12, start = c(2001, 1), frequency = 12))
## Assign the units and comment
units(test.df$x) = “cm”
units(test.df$y) = “m”
comment(test.df) <- “this is a test data set”
## Summary of the data
describe(test.df)
contents(test.df)
The disadvantage of this approach is that the data is lost when functions such as subset is used.
str(subset(test.df, select = a, drop = FALSE))
This render the use only restrict to storage but not manipulation.
The second approach of the IRange package creates a whole new S4 class for handling data with meta-data, with corresponding accessor functions the attributes can be retained.
library(IRanges)
test2.df <- DataFrame(x = 1:10, y = letters[1:10])
metadata(test2.df) <- list(units=list(a = “cm”, b=”m”))
str(subset(test2.df, select = x))
In this case the units are still preserved, nevertheless the subset function does not subset the meta-data which can cause problem.
In short, there are definitely rooms for improvement. Writing a new class is definitely more natural and gives the developer and user more control.
To leave a comment for the author, please follow the link and comment on their blog: StaTEAstics..
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.