[This article was first published on Statistic on aiR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
t-Test to compare the means of two groups under the assumption that both samples are random, independent, and come from normally distributed population with unknow but equal variancesWant to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here I will use the same data just seen in a previous post. The data are given below:
A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180
B: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180
To solve this problem we must use to a Student’s t-test with two samples, assuming that the two samples are taken from populations that follow a Gaussian distribution (if we cannot assume that, we must solve this problem using the non-parametric test called Wilcoxon-Mann-Whitney test; we will see this test in a future post). Before proceeding with the t-test, it is necessary to evaluate the sample variances of the two groups, using a Fisher’s F-test to verify the homoskedasticity (homogeneity of variances). In R you can do this in this way:
a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) b = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180) var.test(a,b) F test to compare two variances data: a and b F = 2.1028, num df = 9, denom df = 9, p-value = 0.2834 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.5223017 8.4657950 sample estimates: ratio of variances 2.102784
We obtained p-value greater than 0.05, then we can assume that the two variances are homogeneous. Indeed we can compare the value of F obtained with the tabulated value of F for alpha = 0.05, degrees of freedom of numerator = 9, and degrees of freedom of denominator = 9, using the function
qf(p, df.num, df.den)
:qf(0.95, 9, 9) [1] 3.178893
Note that the value of F computed is less than the tabulated value of F, which leads us to accept the null hypothesis of homogeneity of variances.
NOTE: The F distribution has only one tail, so with a confidence level of 95%,
p = 0.95
. Conversely, the t-distribution has two tails, and in the R’s function qt(p, df)
we insert a value p = 0975
when you’re testing a two-tailed alternative hypothesis.Then call the function t.test for homogeneous variances (
var.equal = TRUE
) and independent samples (paired = FALSE
: you can omit this because the function works on independent samples by default) in this way:t.test(a,b, var.equal=TRUE, paired=FALSE) Two Sample t-test data: a and b t = -0.9474, df = 18, p-value = 0.356 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -10.93994 4.13994 sample estimates: mean of x mean of y 174.8 178.2
We obtained p-value greater than 0.05, then we can conclude that the averages of two groups are significantly similar. Indeed the value of t-computed is less than the tabulated t-value for 18 degrees of freedom, which in R we can calculate:
qt(0.975, 18) [1] 2.100922
This confirms that we can accept the null hypothesis H0 of equality of the means.
To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.