[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is well known the binomial test never has an error of exactly 5%. You aim for at most 5%, calculate the number correct to get there and end up with an error of e.g 2%. This is a shame but there is no solution. However, it is also an opportunity; the ‘unused’ error may be employed for additional testing. For instance, in a triangle test, why not aim for say 30 persons, do a pre-test at 17 persons where H0 is to be rejected at less than 1% error level. When not rejected, continue to 30, reject at the original 5% and still have an overall error level of less than 5%?Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Example
As described before (blog entry) in a triangle test is a sensory test where there is a chance of 1/3 to select the correct product. When the proportion correct is significantly larger than 1/3 the products are deemed different. This latter part is performed with a binomial test.
With 30 trials the number of correct needs to be 14 with resulting alpha 0.043
n_2 <- 30
(critVal_2 <- qbinom(.95,n_2,1/3))
[1] 14
pbinom(critVal_2,n_2,1/3,lower.tail=FALSE)
[1] 0.04348228
With 17 trials:
n_1 <- 17
(critVal_1 <- qbinom(.98,n_1,1/3))
[1] 10
pbinom(critVal_1,n_1,1/3,lower.tail=FALSE)
[1] 0.00800819
As can be seen the errors add to more than 0.05. However, they don’t need to be added. The only way to get in the 30 trials persons situation is to have less than 10 correct in the first phase. This conditional value can be calculated easily with a few functions. What is done is to examine for each number of correct in the first phase what the chance is to get sufficient correct in the second phase to reach the critical value. These numbers are multiplied with the chance to get the corresponding number correct in the first phase and added. This is shown in the next two functions.
condPval_S1 <- function(nFound_1,n_1,n_2,critVal_2) {
nAdditional <- n_2-n_1
if (nAdditional < critVal_2-nFound_1+1) 0
else {
pbinom(critVal_2-nFound_1,
nAdditional,1/3,lower.tail=FALSE)
}
}
condPval <- function(n_1,critVal_1,n_2,alpha2=0.05) {
critVal_2 <- qbinom(1-alpha2,n_2,1/3)
nFound <- 0:critVal_1
sa <- sapply(nFound,function(nFound_1)
dbinom(nFound_1,n_1,1/3)*
condPval_S1(nFound_1,n_1,n_2,critVal_2))
sum(sa)
}
condPval(n_1,critVal_1,n_2)+p_H0_1
[1] 0.04568412
The total error level is slightly less than 5%. Hence we can do this even while we keep to the 5% level which is promised.
A bit more extensive
In practice, not everybody asked to come and do the triangle test will be there to taste. What if there are a few trials short or extra? Obviously this can be calculated as well. The apply function helps greatly. The overall level is for all these cases is under 5%.
range(sapply(25:35,function(n_2)
condPval(n_1,critVal_1,n_2)+p_H0_1 ))
[1] 0.02484166 0.04711329
This can also be put into a function with a bit more details:
McondPval <- function(n_1,
n_2_min = round(n_1*1.5),
n_2_max = 3*n_1) {
critVal_1 <- qbinom(.98,n_1,1/3)
p_H0_1 <- pbinom(critVal_1,n_1,1/3,lower.tail=FALSE)
n_2 <- n_2_min:n_2_max
alpha <- sapply(n_2,function(n_2)
condPval(n_1,critVal_1,n_2)+p_H0_1 )
critVal_2 <- qbinom(.95,n_2,1/3)
alpha_orig <- pbinom(critVal_2,n_2,1/3,lower.tail=FALSE)
return(data.frame(n_1,n_2,alpha,alpha_orig))
}
McondPval(17,25,35)
n_1 n_2 alpha alpha_orig
1 17 25 0.04277008 0.04151368
2 17 26 0.02729441 0.02475400
3 17 27 0.03792652 0.03592712
4 17 28 0.02484166 0.02156168
5 17 29 0.03384347 0.03113864
6 17 30 0.04568412 0.04348228
7 17 31 0.03037324 0.02702409
8 17 32 0.04047668 0.03765334
9 17 33 0.02740706 0.02348101
10 17 34 0.03605287 0.03265134
11 17 35 0.04711329 0.04419916
Unfortunately, it is not always this nice. With these settings at 16 trials in the first phase it may go wrong. Look at 30 and 35 trials total. The 30 trials is just over 5%, while the 35 is clearly over it. Either the test in phase 1 should be more stringent or it should be ensured not to end with 35 trials at the end of testing. It does not matter which of these is chosen but we have to choose. Ideally the level of testing at phase 1 should be determined prior to knowing how many correct there are.
McondPval(16,25,35)
n_1 n_2 alpha alpha_orig
1 16 25 0.04648649 0.04151368
2 16 26 0.03245610 0.02475400
3 16 27 0.04236557 0.03592712
4 16 28 0.03048510 0.02156168
5 16 29 0.03885603 0.03113864
6 16 30 0.05006582 0.04348228
7 16 31 0.03584845 0.02702409
8 16 32 0.04538488 0.03765334
9 16 33 0.03326027 0.02348101
10 16 34 0.04140208 0.03265134
11 16 35 0.05194322 0.04419916
Conclusion
With a few simple functions and a bit of care an extra hypothesis test can be added during a triangle test. This gives opportunity to declare differences at an intermediate step while retaining the original error level.
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.