Some times we would like to test the equality of population variances of two normal distributions from which random samples are drawn. For example, before applying the stuents t-test on two sets of samples, we need to test whether their variances are equal. Given that the two population means are equal, we may like to test whether any significance difference exists in the spread around their mean values.
Let two independent random variables X and Y be drawn from the normal distributions \(\small{N(\mu_X, \sigma_X^2)}\) and \(\small{N(\mu_Y, \sigma_Y^2) }\).
Let n and m denote the sample sizes of X and Y respectivly.
In order to test whether their populaion variances are equal, we randomly draw n samples from X and m samples from Y distributions and compute their respective sample variance \(\small{S_X^2 }\) and \(\small{S_Y^2 }\).
From the properties of the Chi-square distribution, we know that the quantities \(\small{\dfrac{(n-1)S_X^2}{\sigma_X^2} }\) and \(\small{\dfrac{(m-1)S_Y^2}{\sigma_Y^2} }\)follow independent chi-square distributions \(\small{\chi^2(n-1) }\) and \(\small{\chi^2(m-1) }\) respectively.
Also, the ratio of variances of two normal variables gives rise to an F distribution given by,
\(~~~~~\small{F = \dfrac{ \left(\dfrac{S_X^2}{\sigma_X^2}\right) } { \left(\dfrac{S_Y^2}{\sigma_Y^2}\right) } = \dfrac { \dfrac{\left[ \dfrac{(n-1)S_X^2 } {\sigma_X^2 } \right]}{(n-1)} } { \dfrac{\left[ \dfrac{(m-1)S_Y^2 } {\sigma_Y^2 } \right]}{(m-1)} } }~~~~~\) is an F distribution with (n-1) and (m-1) degrees of freedom.
Under the null hypothesis \(\small{H_0~:~\sigma_X^2 = \sigma_Y^2}\), we get the F statistic for our test :
\(\small{~~~~~F~=~\dfrac{S_X^2}{S_Y^2}~}\) is a F distribution F(n-1, m-1) with two degrees of freedom n-1 and m-1
We also make use of an important property of F distribution: If a avariable F follows \(\small{F(n-1,m-1)}\), then its
reciprocal \(\small{\dfrac{1}{F} }\) follows \(\small{F(m-1, n-1)}\).
Therefore, \(\small{~~F~=~\dfrac{S_Y^2}{S_X^2}~}\) is a F distribution F(m-1, n-1) with degrees of freedom m-1 and n-1
The three alternate hypothesis can be tested as follows:
1. Two sided alternate hypothesis
\(~~~~~~\small{H_0:~~\sigma_X^2~=~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\ne~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_X^2}{S_Y^2} \ge F_{1 - \alpha/2}(n-1,m-1 ) }~~\) or
\(~~~~~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{1-\alpha/2}(m-1,n-1 ) }\)
2. One sided alternate hypothesis
\(~~~~~~\small{H_0:~~\sigma_X^2~\le~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\gt~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_X^2}{S_Y^2} \ge F_{1-\alpha}(n-1,m-1 ) }~~\)
3. One sided alternate hypothesis
\(~~~~~~\small{H_0:~~\sigma_X^2~\ge~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\lt~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{1 - \alpha}(m-1,n-1 ) }~~\)
For all the three cases, we can also compute the p-value for the statistic \( F = \dfrac{S_X^2}{S_Y^2} \) and compare it with the significance level \(\alpha\) to reject or accept the null hypothesis.
var.test(x, y, ratio=1, alternative, conf.level) wherex, y = data vectors of two observationsratio = the hypothesized ratio of the population variances of ‘x’ and 'y'. The default ration=1 tests the equality of variances.alternative = A character string specifying alternate hypothesis. alternate can take three values :"two.sided", "less", "greater" It can also be a vector with all the above three strings, in which case all the three hypothesis will be tested. "less" means the population mean of x is less than that of y. "greater" mens the population mean of x is greater thn that of y. default value is"two.sided" conf.level = confidence level of the interval. For example,conf.level=0.95 sets a 95% confidence interval.
Running the above code lines prints the following lines of results on the screen:########################## We call the R function var.test() as demonstrated below: x = c(1067.7, 984.3,998.8,1025.9,1060.9,959.1,1013.8,1047.0,987.8,1051.0,885.2, 1049.5,1098.2,1001.5,1011.1,991.6) y = c(957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5, 1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1) res = var.test(x,y, ratio=1, alternative = "two.sided", conf.level = 0.95) print(res)
The R script below performs the same operation as the function var.test() for all the three possible null hypothesis:F test to compare two variances data: x and y F = 0.53249, num df = 15, denom df = 15, p-value = 0.2338 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1860482 1.5240278 sample estimates: ratio of variances 0.5324872
################################################## ## Script for testing the ratio of two population variances. ## x and y are the data vectors ## alpha = significance level for the test two_sample_variance_test = function(x, y, alpha){ ## compute the standard deviation and sample size from the data sx = sd(x) sy = sd(y) n = length(x) m = length(y) ## Two sided test : Alternative hypothesis : population variances are not equal. print("Two sided alternate hypothesis : population variances are not equal :") print(" ") variance_ratio = sx^2/sy^2 F_critical = qf(1-alpha/2, n-1, m-1) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_x/var_y = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_x/var_y = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------------------") ## One sided test : #Alternate hypothesis : population variance of x > population variance of y print("One sided alternate hypothesis : population variance of x > population variance of y :") print(" ") variance_ratio = sx^2/sy^2 F_critical = qf(1-alpha, n-1, m-1) variance_ratio = round(variance_ratio,2) F_critical = round(F_critical,2) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_x/var_y = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_x/var_y = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------------------") ## One sided test : Alternate hypothesis : population variance of x < population variance of y print("One sided alternate hypothesis : population variance of x < population variance of y :") print(" ") variance_ratio = sy^2/sx^2 F_critical = qf(1-alpha, m-1, n-1) ## note that m-1 and n-1 have been swapped in position. variance_ratio = round(variance_ratio,2) F_critical = round(F_critical,2) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_y/var_x = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_y/var_x = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------") return(1) } ### Testing x = c(1067.7, 984.3,998.8,1025.9,1060.9,959.1,1013.8,1047.0,987.8,1051.0,885.2, 1049.5,1098.2,1001.5,1011.1,991.6) y = c(957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5, 1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1) two_sample_variance_test(x, y, 0.05) ###############------------------------------------------------
Executing the above script in R prints the following results and figures of probability distribution on the screen:
[1] "Two sided alternate hypothesis : population variances are not equal :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_x/var_y = 0.532487247172825 is less than F_critical value 2.86209253046351" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance of x > population variance of y :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_x/var_y = 0.53 is less than F_critical value 2.4" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance of x < population variance of y :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_y/var_x = 1.88 is less than F_critical value 2.4" [1] "-----------------------------------------"