F-Test: Compare Two Variances in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Contents
When to you use F-test?
Comparing two variances is useful in several cases, including:
When you want to perform a two samples t-test to check the equality of the variances of the two samples
When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?
Research questions and statistical hypotheses
Typical research questions are:
- whether the variance of group A (\(\sigma^2_A\)) is equal to the variance of group B (\(\sigma^2_B\))?
- whether the variance of group A (\(\sigma^2_A\)) is less than the variance of group B (\(\sigma^2_B\))?
- whether the variance of group A (\(\sigma^2_A\)) is greather than the variance of group B (\(\sigma^2_B\))?
In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:
- \(H_0: \sigma^2_A = \sigma^2_B\)
- \(H_0: \sigma^2_A \leq \sigma^2_B\)
- \(H_0: \sigma^2_A \geq \sigma^2_B\)
The corresponding alternative hypotheses (\(H_a\)) are as follow:
- \(H_a: \sigma^2_A \ne \sigma^2_B\) (different)
- \(H_a: \sigma^2_A > \sigma^2_B\) (greater)
- \(H_a: \sigma^2_A < \sigma^2_B\) (less)
Note that:
- Hypotheses 1) are called two-tailed tests
- Hypotheses 2) and 3) are called one-tailed tests
Formula of F-test
The test statistic can be obtained by computing the ratio of the two variances \(S_A^2\) and \(S_B^2\).
\[F = \frac{S_A^2}{S_B^2}\]
The degrees of freedom are \(n_A – 1\) (for the numerator) and \(n_B – 1\) (for the denominator).
Note that, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.
Note that, the F-test requires the two samples to be normally distributed.
Compute F-test in R
R function
The R function var.test() can be used to compare two variances as follow:
# Method 1 var.test(values ~ groups, data, alternative = "two.sided") # or Method 2 var.test(x, y, alternative = "two.sided")
- x,y: numeric vectors
- alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.
Import and check your data into R
To import your data, use the following R code:
# If .txt tab file, use this my_data <- read.delim(file.choose()) # Or, if .csv file, use this my_data <- read.csv(file.choose())
Here, we’ll use the built-in R data set named ToothGrowth:
# Store the data in the variable my_data my_data <- ToothGrowth
To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function sample_n()[in dplyr package]:
library("dplyr") sample_n(my_data, 10) len supp dose 43 23.6 OJ 1.0 28 21.5 VC 2.0 25 26.4 VC 2.0 56 30.9 OJ 2.0 46 25.2 OJ 1.0 7 11.2 VC 0.5 16 17.3 VC 1.0 4 5.8 VC 0.5 48 21.2 OJ 1.0 37 8.2 OJ 0.5
We want to test the equality of variances between the two groups OJ and VC in the column “supp”.
Preleminary test to check F-test assumptions
F-test is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the F-test.
Shapiro-Wilk test can be used to test whether the normal assumption holds. It’s also possible to use Q-Q plot (quantile-quantile plot) to graphically evaluate the normality of a variable. Q-Q plot draws the correlation between a given sample and the normal distribution.
If there is doubt about normality, the better choice is to use Levene’s test or Fligner-Killeen test, which are less sensitive to departure from normal assumption.
Compute F-test
# F-test res.ftest <- var.test(len ~ supp, data = my_data) res.ftest F test to compare two variances data: len by supp F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.3039488 1.3416857 sample estimates: ratio of variances 0.6385951
Interpretation of the result
Access to the values returned by var.test() function
The function var.test() returns a list containing the following components:
- statistic: the value of the F test statistic.
- parameter: the degrees of the freedom of the F distribution of the test statistic.
- p.value: the p-value of the test.
- conf.int: a confidence interval for the ratio of the population variances.
- estimate: the ratio of the sample variances
The format of the R code to use for getting these values is as follow:
# ratio of variances res.ftest$estimate ratio of variances 0.6385951 # p-value of the test res.ftest$p.value [1] 0.2331433
Infos
This analysis has been performed using R software (ver. 3.3.2).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.