[This article was first published on R – Bearded Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Often when writing a manuscript in using knitr and xtable I am flustered by my p-values. In simple summary tables, R conveniently rounds my p-values to be 0: a mathematically inappropriate task. A colleague recently commented on the poor reporting of my table (shown below using print.xtable with the type=”html” argument), inspiring a much needed change.< !-- html table generated in R 3.2.3 by xtable 1.8-0 package -->
< !-- Thu Mar 10 22:09:34 2016 -->
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Estimate | Std.err | Wald | Pr(>|W|) | |
---|---|---|---|---|
(Intercept) | 0.001704 | 0.000005 | 100409.770956 | 0.000000 |
sizemedium | 0.000046 | 0.000005 | 90.534705 | 0.000000 |
sizesmall | 0.000003 | 0.000005 | 0.294331 | 0.587458 |
time | -0.000004 | 0.000001 | 11.614917 | 0.000654 |
fixp <- function(x, dig=3){ x <- as.data.frame(x) if(substr(names(x)[ncol(x)],1,2) != "Pr") warning("The name of the last column didn't start with Pr. This may indicate that p-values weren't in the last row, and thus, that this function is inappropriate.") x[,ncol(x)] <- round(x[,ncol(x)], dig) for(i in 1:nrow(x)){ if(x[i,ncol(x)] == 0) x[i,ncol(x)] <- paste0("< .", paste0(rep(0,dig-1), collapse=""), "1") } x }All that’s going on: the function is pulling in the summary table (usually through a $coef), trying to turn it into a dataframe (some already are, though some tables are numeric (e.g. lm)), throwing a warning if the last heading doesn’t begin with “Pr” (as it may not be the column that contains p-values), and editing any values that were rounded to 0 (at the user specified rounding point) to be < the smallest number that could be rounded to (e.g. <.01). Then we output the edited table, all ready for reporting! To mimic what was above, we set our digits to be equal to 6 (so go out 6 decimal places for the p-value), and re-run:< !-- html table generated in R 3.2.3 by xtable 1.8-0 package --> < !-- Thu Mar 10 22:14:07 2016 -->
Estimate | Std.err | Wald | Pr(>|W|) | |
---|---|---|---|---|
(Intercept) | 0.001704 | 0.000005 | 100409.770956 | < .000001 |
sizemedium | 0.000046 | 0.000005 | 90.534705 | < .000001 |
sizesmall | 0.000003 | 0.000005 | 0.294331 | 0.587458 |
time | -0.000004 | 0.000001 | 11.614917 | 0.000654 |
Estimate | Std.err | Wald | Pr(>|W|) | |
---|---|---|---|---|
(Intercept) | 0.001704 | 0.000005 | 100409.770956 | < .01 |
sizemedium | 0.000046 | 0.000005 | 90.534705 | < .01 |
sizesmall | 0.000003 | 0.000005 | 0.294331 | 0.59 |
time | -0.000004 | 0.000001 | 11.614917 | < .01 |
#this gives a summary table with a small p-value. Trying to report this with xtable would results in an R rounding issue! (mod <- coef(summary(lm(uptake ~ conc + Treatment + Type + Plant, data=CO2)))) #this fixes the p-value to 2 digits, correctly reporting p-values that would have been rounded to 0 fixp(mod,dig=2)Here’s the final output via print.xtable (dig=2 for fixp and xtable): < !-- Thu Mar 10 22:01:54 2016 -->
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 37.42 | 4.67 | 8.00 | < .01 |
conc | 0.02 | 0.00 | 7.96 | < .01 |
Treatmentchilled | -12.50 | 5.10 | -2.45 | 0.02 |
TypeMississippi | -23.33 | 6.01 | -3.88 | < .01 |
Plant.L | 21.58 | 11.14 | 1.94 | 0.06 |
Plant.Q | -4.62 | 2.27 | -2.03 | 0.05 |
Plant.C | 1.46 | 5.10 | 0.29 | 0.78 |
Plant^4 | 2.34 | 2.27 | 1.03 | 0.31 |
Plant^5 | -0.48 | 5.77 | -0.08 | 0.93 |
Plant^6 | -0.04 | 2.27 | -0.02 | 0.99 |
Plant^7 | -1.91 | 3.64 | -0.53 | 0.6 |
Plant^8 | -3.28 | 2.27 | -1.44 | 0.15 |
Plant^10 | 0.55 | 2.27 | 0.24 | 0.81 |
- Again, this assumes that the last column is the one to be transformed. This is by design, though may be inconvenient in some situations. If needed, the change is easily made through the definition of the function.
- When the last column is manipulated, it becomes a character column in the dataframe. Alternatively, when it is rounded but no entry rounds to 0, it is numeric.
- This assumes a dataframe-style format of your table. Thus, this method will NOT be effective at correcting reported p-values for an individual test: say a t-test, where only the statistic is reported (and not a table). Personally this is not a concern, as I deal with these situations in other ways, but for some users seeking an overall “p-value fixing” method this may not be the answer.
devtools:install_github("flor3652/myStuff")
To leave a comment for the author, please follow the link and comment on their blog: R – Bearded Analytics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.