[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject.
R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame. David Robinson’s broom package bridges this gap by taking un-tidy output from model objects, which are not data frames, and returning them in a tidy data frame format.
(From the documentation): if you performed a linear model on the built-in
mtcars
dataset and view the object directly, this is what you’d see:lmfit = lm(mpg ~ wt, mtcars) lmfit Call: lm(formula = mpg ~ wt, data = mtcars) Coefficients: (Intercept) wt 37.285 -5.344 summary(lmfit) Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.543 -2.365 -0.125 1.410 6.873 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.285 1.878 19.86 < 2e-16 *** wt -5.344 0.559 -9.56 1.3e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.05 on 30 degrees of freedom Multiple R-squared: 0.753, Adjusted R-squared: 0.745 F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10
If you’re just trying to read it this is good enough, but if you’re doing other follow-up analysis or visualization, you end up hacking around with
str()
and pulling out coefficients using indices, and everything gets ugly quick.But the
tidy
function in the broom package run on the fit object probably gives you what you were looking for in a tidy data frame:tidy(lmfit) term estimate stderror statistic p.value 1 (Intercept) 37.285 1.8776 19.858 8.242e-19 2 wt -5.344 0.5591 -9.559 1.294e-10
The
tidy()
function also works on other types of model objects, like those produced by glm()
and nls()
, as well as popular built-in hypothesis testing tools like t.test()
, cor.test()
, or wilcox.test()
.View the README on the GitHub page, or install the package and run the vignette to see more examples and conventions.
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.