Melt
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R.
The melt function takes data in wide format and stacks a set of columns into a single column of data. To make use of the function we need to specify a data frame, the id variables (which will be left at their settings) and the measured variables (columns of data) to be stacked. The default assumption on measured variables is that it is all columns that are not specified as id variables.
Consider the following set of data:
> dat FactorA FactorB Group1 Group2 Group3 Group4 1 Low Low -1.1616334 -0.5228371 -0.6587093 0.45064563 2 Medium Low -0.5991478 -1.0461138 -0.1942979 2.47985577 3 High Low 0.8420797 -1.5413266 0.6318852 -0.98948125 4 Low Medium 1.6225569 -1.2706469 -0.8026467 -0.32332181 5 Medium Medium -0.3450745 -1.3377985 1.4988363 0.36541918 6 High Medium 1.6025044 0.7631882 -0.5375833 0.85028148 7 Low High -1.2991011 -0.2223622 -0.6321478 -1.57284216 8 Medium High -0.4906400 -1.1802192 0.1235253 0.09891793 9 High High 0.3897769 -0.3832142 0.6671101 0.23407257 |
There four groups are to used as part of a statistical analysis so we want to stack them into a single column and create an factor variable to indicate which group the measurement corresponds to and the melt function does the trick:
> melt(dat) Using FactorA, FactorB as id variables FactorA FactorB variable value 1 Low Low Group1 -1.16163338 2 Medium Low Group1 -0.59914783 3 High Low Group1 0.84207974 4 Low Medium Group1 1.62255690 5 Medium Medium Group1 -0.34507455 6 High Medium Group1 1.60250438 ... 36 High High Group4 0.23407257 |
Consider a second set of data where there are two groups but we only want to retain the FactorB variable in the molten data set:
FactorA FactorB Group1 Group2 1 Low Very Low 6.851828 3.061329 2 Medium Very Low 7.352169 1.303077 3 High Very Low 6.918091 2.477875 4 Low Low 7.402351 2.450527 5 Medium Low 6.928385 4.334323 6 High Low 7.400626 3.074158 7 Low Medium 8.312145 5.725185 8 Medium Medium 8.251806 4.384492 9 High Medium 8.339398 3.443789 10 Low High 5.127386 2.868952 11 Medium High 8.561181 3.616898 12 High High 6.993838 3.450634 13 Low Very High 7.880877 2.950622 14 Medium Very High 9.439892 3.220295 15 High Very High 8.799447 3.106060 |
We now need to specify both the id.vars and measure.vars arguments in the melt function to get the desired output:
> melt(dat, id.vars = "FactorB", measure.vars = c("Group1", "Group2")) FactorB variable value 1 Very Low Group1 6.851828 2 Very Low Group1 7.352169 3 Very Low Group1 6.918091 4 Low Group1 7.402351 5 Low Group1 6.928385 6 Low Group1 7.400626 ... 30 Very High Group2 3.106060 |
Other useful resources are provided on the Supplementary Material page.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.