Paired Student’s t-test
[This article was first published on Statistic on aiR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Comparison of the means of two sets of paired samples, taken from two populations with unknown variance.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A school athletics has taken a new instructor, and want to test the effectiveness of the new type of training proposed by comparing the average times of 10 runners in the 100 meters. Are below the time in seconds before and after training for each athlete.
Before training: 12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3
After training: 12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1
After training: 12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1
In this case we have two sets of paired samples, since the measurements were made on the same athletes before and after the workout. To see if there was an improvement, deterioration, or if the means of times have remained substantially the same (hypothesis H0), we need to make a Student’s t-test for paired samples, proceeding in this way:
a = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3) b = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1) t.test(a,b, paired=TRUE) Paired t-test data: a and b t = -0.2133, df = 9, p-value = 0.8358 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5802549 0.4802549 sample estimates: mean of the differences -0.05
The p-value is greater than 0.05, then we can accept the hypothesis H0 of equality of the averages. In conclusion, the new training has not made any significant improvement (or deterioration) to the team of athletes.
Similarly, we calculate the t-tabulated value:
qt(0.975, 9) [1] 2.262157
t-computed < t-tabulated, so we accept the null hypothesis H0.
Suppose now that the manager of the team (given the results obtained) fired the coach who has not made any improvement, and take another, more promising. We report the times of athletes after the second training:
Before training: 12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3
After the second training: 12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0
After the second training: 12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0
Now we check if there was actually an improvement, ie perform a t-test for paired data, specifying in R to test the alternative hypothesis H1 of improvement in times. To do this simply add the syntax
alt = "less"
when you call the t-test:a = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3) b = c(12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0) t.test(a,b, paired=TRUE, alt="less") Paired t-test data: a and b t = 5.2671, df = 9, p-value = 0.9997 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf 2.170325 sample estimates: mean of the differences 1.61
With this syntax we asked R to check whether the mean of the values contained in the vector
a
is less of the mean of the values contained in the vector b
. In response, we obtained a p-value well above 0.05, which leads us to conclude that we can reject the null hypothesis H0 in favor of the alternative hypothesis H1: the new training has made substantial improvements to the team.If we had written:
t.test (a, b, paired = TRUE, alt = "greater")
, we asked R to check whether the mean of the values contained in the vector a
is greater than the mean of the values contained in the vector b
. In light of the previous result, we can suspect that the p-value will be much smaller than 0.05, and in fact:a = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3) b = c(12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0) t.test(a,b, paired=TRUE, alt="greater") Paired t-test data: a and b t = 5.2671, df = 9, p-value = 0.0002579 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 1.049675 Inf sample estimates: mean of the differences 1.61
To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.