Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Structure() function is a simple, yet powerful function that describes a given object with given attributes. It is part of base R language library, so there is no need to load any additional library. And also, since the function was part of S-Language, it is in the base library from the earlier versions, making it backward or forward compatible.
Example:
dd <- structure(list( year = c(2001, 2002, 2004, 2006) ,length_days = c(366.3240, 365.4124, 366.5323423, 364.9573234)) ,.Names = c("year", "length of days") ,row.names = c(NA, -4L) ,class = "data.frame")
All objects created using structure() – whether homogeneous (matrix, vector) or heterogeneous (data.frame, list) – have additional metadata information stored, using attributes. Like creating a simple vector with additional metadata information:
just_vector <- structure(1:10, comment = "This is my simple vector with info")
And by using function:
attributes(just_vector)
We get the information back:
$`comment` [1] "This is my simple vector with info"
In one go
So, let us suppose you want to create a structure (S3) in one step. The following would create a data.frame (heterogeneous) with several steps:
year = c(1999, 2002, 2005, 2008) pollution = c(346.82,134.308821199349, 130.430379885892, 88.275457392443) dd2 <- data.frame(year,pollution) dd2$year <- as.factor(dd2$year)
Using structure, we can do this simpler and faster:
dd <- structure(list( year = as.factor(c(2001, 2002, 2004, 2006)) ,length_days = c(366.3240, 365.4124, 366.5323423, 364.9573234)) ,.Names = c("year", "length of days") ,row.names = c(NA, -4L) ,class = "data.frame")
Useful cases when using structure() function are:
- when creating a smaller data-set within your Jupyter notebook (using Markdown )
- when creating data-sets within your R code demo/example (and not using external CSV / TXT / JSON files)
- when describing a given object with mixed data types (e.i.: data frame) and prepare it for data import
- when creating many R environments and each have independent data-set
- for persisting data
- and many more…
Constructing data-frame with additional attributes and comments.
dd3 <- structure(list( v1 = as.factor(c(2001, 2002, 2004, 2006)) ,v2 = I(c(2001, 2002, 2004, 2006)) ,v3 = ordered(c(2001, 2002, 2004, 2006)) ,v4 = as.double(c(366.3240, 365.4124, 366.5323423, 364.9573234))) ,.Names = c("year", "AsIs Year","yearO", "length of days") ,.typeOf = c("factor", "numeric", "ordered","numeric") ,row.names = c(NA, -4L) ,class = "data.frame" ,comment = "Ordered YearO for categorical analysis and other variables")
Nesting lists within lists can also be done, or even preserving the original data-sets as sub-list, hidden from the dataframe, can also be an option.
And checking comments can be done as:
attributes(dd3)$comment attr(dd3, which="comment")
Both yield same results, as:
> attributes(dd3)$comment [1] "Ordered YearO for categorical analysis and other variables" > attr(dd3, which="comment") [1] "Ordered YearO for categorical analysis and other variables"
This simple, yet very useful code example with effective function is as always, available at Github.
Happy Rrrring!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.