[This article was first published on Statistic on aiR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Comparison of the averages of two independent groups of samples, of which we can not assume a distribution of Gaussian type; is also known as Mann-Whitney U-test.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
You want to see if the mean of goals suffered by two football teams over the years is the same. Are below the number of goals suffered by each team in 6 games for each year.
Team A: 6, 8, 2, 4, 4, 5
Team B: 7, 10, 4, 3, 5, 6
Team B: 7, 10, 4, 3, 5, 6
The Wilcoxon-Matt-Whitney test (or Wilcoxon rank sum test, or Mann-Whitney U-test) is used when is asked to compare the means of two groups that do not follow a normal distribution: it is a non-parametrical test. It is the equivalent of the t test, applied for independent samples.
Let’s see how to solve the problem with R:
a = c(6, 8, 2, 4, 4, 5) b = c(7, 10, 4, 3, 5, 6) wilcox.test(a,b, correct=FALSE) Wilcoxon rank sum test data: a and b W = 14, p-value = 0.5174 alternative hypothesis: true location shift is not equal to 0
The p-value is greater than 0.05, then we can accept the hypothesis H0 of statistical equality of the means of two groups.
If you run
wilcox.test(b, a, correct = FALSE)
, the p-value would be logically the same:a = c(6, 8, 2, 4, 4, 5) b = c(7, 10, 4, 3, 5, 6) wilcox.test(b,a, correct=FALSE) Wilcoxon rank sum test data: b and a W = 22, p-value = 0.5174 alternative hypothesis: true location shift is not equal to 0
The value W is so computed:
sum.rank.a = sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a W = sum.rank.a – (length(a)*(length(a)+1)) / 2 W [1] 14 sum.rank.b = sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b W = sum.rank.b – (length(b)*(length(b)+1)) / 2 W [1] 22
We can finally compare the intervals tabulated on the tables of Wilcoxon for independent samples. The tabulated interval for two groups of 6 samples each is (26, 52), while the interval of our samples is:
sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a [1] 35 sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b [1] 43
Since the computed interval (35, 43), is contained within the tabulated interval (26, 52), we conclude by accepting the hypothesis H0 of equality of means.
To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.