Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…
At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :
One-sample and two-sample robust Hotelling tests with fast and robust bootstrap
The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired.
Robust Hotelling T2 test
Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.
The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following..
library(rrcov) data(delivery) delivery.x <- delivery[,1:2] T2.test(delivery.x) # # One-sample Hotelling test # # data: delivery.x # T^2 = 21.0494, df1 = 2, df2 = 23, p-value = 6.365e-06 # alternative hypothesis: true mean vector is not equal to (0, 0)' # # sample estimates: # n.prod distance # mean x-vector 8.76 409.28 t0<-Sys.time() T2.test(delivery.x, method="mcd") # # One-sample Hotelling test (Reweighted MCD Location) # # data: delivery.x # T^2 = 37.701, df1 = 2.000, df2 = 9.146, p-value = 3.829e-05 # alternative hypothesis: true mean vector is not equal to (0, 0)' # # sample estimates: # n.prod distance # MCD x-vector 6.190476 309.7143 Sys.time()-t0 # Time difference of 0.04200006 secs library(FRB) t0<-Sys.time() FRBhotellingMM(delivery.x) # One sample Hotelling test based on multivariate MM-estimates # (bdp = 0.5, eff = 0.95) # data: delivery.x # T^2_R = 84.59 # p-value = 0.0022 # Alternative hypothesis : true mean vector is not equal to ( 0 0 ) Sys.time()-t0 # Time difference of 4.859 secs
Time consuming as it may is I would stick with the Bootstrap method. What would you do?
Read more
Roelant, E., Van Aelst, S., and Willems, G. (2008), “Fast Bootstrap for Robust Hotelling Tests,” COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, to appear.
Willems G., Pison G., Rousseeuw P. and Van Aelst S. (2002), A robust hotelling test, Metrika, 55, 125–138.
< !-- MixPanel Start !-->
< !-- MixPanel End -->
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.