[This article was first published on Minding the Brain, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m often in the position of needing to compare groups of either items or participants on some set of variables. For example, I might want to compare recognition of words that differ on some measure of lexical neighborhood density but are matched on word length, frequency, etc. Similarly, I might want to compare individuals with aphasia that have anterior vs. posterior lesions but are matched on lesion size, aphasia severity, age, etc. I’ll also need to report these comparisons in a neat table if/when I write up the results of the study. This means computing and collating a bunch of means, standard deviations, and t-tests. This is not particularly difficult, but it is somewhat laborious (and boring), so I decided to write a function that would do it for me. Details after the jump.
The function (compareGroups) takes a data frame and the name of the grouping variable and returns a data frame with rows corresponding to each of the numeric variables in the original data frame and columns corresponding to the means, standard deviations, and t– and p-values for the t-test comparing the groups. There is also a row for the number of observations in each group. It should be easy to tweak the function to handle more than 2 groups, but then it would need a different statistical test and the 2-group case is the most common one for me.
Here’s an example of the function in action, generating the results for Table 1 from our recent paper investigating the neural basis of semantic and phonological neighborhood effects in picture naming (Mirman & Graziano, in press):
The function (compareGroups) takes a data frame and the name of the grouping variable and returns a data frame with rows corresponding to each of the numeric variables in the original data frame and columns corresponding to the means, standard deviations, and t– and p-values for the t-test comparing the groups. There is also a row for the number of observations in each group. It should be easy to tweak the function to handle more than 2 groups, but then it would need a different statistical test and the 2-group case is the most common one for me.
Here’s an example of the function in action, generating the results for Table 1 from our recent paper investigating the neural basis of semantic and phonological neighborhood effects in picture naming (Mirman & Graziano, in press):
> summary(SND) ## word SemNear_Cond numNear NOF lnFreqHAL ## anchor : 1 few :36 Min. : 0.00 Min. : 9.0 Min. : 6.16 ## apple : 1 many:36 1st Qu.: 0.00 1st Qu.:13.0 1st Qu.: 7.90 ## ball : 1 Median : 0.50 Median :14.0 Median : 8.75 ## balloon: 1 Mean : 1.65 Mean :14.7 Mean : 8.74 ## banana : 1 3rd Qu.: 2.00 3rd Qu.:16.0 3rd Qu.: 9.60 ## bed : 1 Max. :19.00 Max. :22.0 Max. :12.16 ## (Other):66 ## logfreq NPhon nd cohdens ## Min. :0.363 Min. :2.00 Min. : 0.51 Min. : 0.59 ## 1st Qu.:0.586 1st Qu.:3.00 1st Qu.: 2.45 1st Qu.: 13.58 ## Median :0.952 Median :4.00 Median : 7.34 Median : 34.61 ## Mean :1.057 Mean :4.17 Mean :13.00 Mean : 46.73 ## 3rd Qu.:1.389 3rd Qu.:5.00 3rd Qu.:22.90 3rd Qu.: 64.52 ## Max. :2.347 Max. :7.00 Max. :49.29 Max. :157.30 > source("compareGroups.R") > compareGroups(SND, "SemNear_Cond") ## variable few.M many.M few.SD many.SD t p ## 1 N 36.000 36.000 NA NA NA NA ## 2 numNear 0.000 3.306 0.0000 3.8606 -5.1374 <1e-04 ## 3 NOF 14.389 14.917 2.2963 2.3949 -0.9544 0.343 ## 4 lnFreqHAL 8.708 8.779 1.2564 1.4250 -0.2255 0.822 ## 5 logfreq 1.005 1.108 0.4982 0.5217 -0.8602 0.393 ## 6 NPhon 4.167 4.167 1.1832 1.4442 0.0000 1 ## 7 nd 13.297 12.711 13.1659 13.7660 0.1845 0.854 ## 8 cohdens 50.346 43.106 42.6230 40.7654 0.7365 0.464
As reported in the paper, we had two groups of 36 words that differed in terms of number of near semantic neighbors (numNear) and were matched on number of features (NOF), HAL word frequency (lnFreqHAL), ANC word frequency (logfreq), number of phonemes (NPhon), phonological neighborhood density (nd), and cohort density (cohdens).
I/You will still have to pull together the data for these comparisons, but at least the comparison step will be easy. In my first foray into github, I’ve posted the code for compareGroups as a gist and here it is embedded:
To leave a comment for the author, please follow the link and comment on their blog: Minding the Brain.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.