Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When I was making the slides for a lecture on using Sweave to incorporate R and LaTeX I was unpleasantly surprised at how tedious it can be to extract statistical values and print them in proper LaTeX code.
For example, consider a small toy dataset of lengths with 100 females have a normally distributed length with mean 170cm and standard deviation of 10, and 100 males with mean length of 180cm and standard deviation of 10:
R> foo <- data.frame( length = c(rnorm(100,170,10), rnorm(100,180,10)), sex = rep(c("female","male"),each=100))
A t-test shows that the means are different:
R> t.test(length~sex,data=foo,var.equal=TRUE) Two Sample t-test data: length by sex t = -6.8396, df = 198, p-value = 9.653e-11 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -12.715980 -7.024375 sample estimates: mean in group female mean in group male 170.2455 180.1157
The t.test() function returns a "htest" class object which is commonly used in R and allows us to easily extract the statistic, degrees of freedom and p-value:
R> res <- t.test(length~sex,data=foo,var.equal=TRUE) R> res[['statistic']] t -6.839605 R> res[['parameter']] df 198 R> res[['p.value']] [1] 9.653065e-11
Great, now we can reference the statistic in our Sweave document:
Men were significantly taller than women ($t(\Sexpr{res[['parameter']]})=\Sexpr{res[['statistic']]}$, $p=\Sexpr{res[['p.value']]}$)
This returns: "Men were significantly taller than women (t(198) = −6.83960491494726,
p = 9.65306549553569e − 11)". Obviously we need to round the values:
Men were significantly taller than women ($t(\Sexpr{res[['parameter']]})=\Sexpr{round(res[['statistic']],3)}$, $p=\Sexpr{round(res[['p.value']],3)}$)
Which returns "Men were significantly taller than women (t(198) = −6.84, p = 0)". Better, but the p value should not be rounded to zero but rather be reported as being smaller than 0.001 or something similar if it is very small. To do this and make sure it stays dynamic an ifelse statement is needed:
Men were significantly taller than women ($t(\Sexpr{res[['parameter']]})=\Sexpr{round(res[['statistic']],3)}$, $p \Sexpr{ifelse(res[['p.value']]
Which returns "Men were significantly taller than women (t(198) = −6.84, p < 0.001)".
Good, but this sure was a lot of code to make this simple reference, and for some other classes extracting the statistics from the output object is also a lot harder than the "htest" class. For this reason I wrote the 'swst' package, which stands for SWeave STatistics.
'swst' has two main functions. The 'swp()' function can be used to generate proper LaTeX code with rounded numbers and inequality signs if needed given the name of a statistic, its value, optional degrees of freedom and the p-value. The 'swst()' function is an S3 generic with methods for a few commonly used object classes that extract the statistic, df, and p-value and send the results to 'swp()'.
This reduces the code to:
Men were significantly taller than women \Sexpr{swst(res)}
Which returns "Men were significantly taller than women (t(198) = −6.84, p < 0.001)". That's much less code!
I have written this package in a fairly short time and it is now very short and only supports a few objects. Help is greatly appreciated! If you know an object that needs to be implemented let me know or write your own method on:
http://github.com/SachaEpskamp/swst
The CRAN link is:
http://cran.r-project.org/web/packages/swst/index.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.