Basic Statistics with R

Hypothesis testing the ratio of two population variances

Some times we would like to test the equality of population variances of two normal distributions from which random samples are drawn. For example, before applying the stuents t-test on two sets of samples, we need to test whether their variances are equal. Given that the two population means are equal, we may like to test whether any significance difference exists in the spread around their mean values.

Let two independent random variables X and Y be drawn from the normal distributions \(\small{N(\mu_X, \sigma_X^2)}\) and \(\small{N(\mu_Y, \sigma_Y^2) }\). Let n and m denote the sample sizes of X and Y respectivly.

In order to test whether their populaion variances are equal, we randomly draw n samples from X and m samples from Y distributions and compute their respective sample variance \(\small{S_X^2 }\) and \(\small{S_Y^2 }\).

From the properties of the Chi-square distribution, we know that the quantities \(\small{\dfrac{(n-1)S_X^2}{\sigma_X^2} }\) and \(\small{\dfrac{(m-1)S_Y^2}{\sigma_Y^2} }\)follow independent chi-square distributions \(\small{\chi^2(n-1) }\) and \(\small{\chi^2(m-1) }\) respectively.

Also, the ratio of variances of two normal variables gives rise to an F distribution given by,

\(~~~~~\small{F = \dfrac{ \left(\dfrac{S_X^2}{\sigma_X^2}\right) } { \left(\dfrac{S_Y^2}{\sigma_Y^2}\right) } = \dfrac { \dfrac{\left[ \dfrac{(n-1)S_X^2 } {\sigma_X^2 } \right]}{(n-1)} } { \dfrac{\left[ \dfrac{(m-1)S_Y^2 } {\sigma_Y^2 } \right]}{(m-1)} } }~~~~~\) is an F distribution with (n-1) and (m-1) degrees of freedom.

Under the null hypothesis \(\small{H_0~:~\sigma_X^2 = \sigma_Y^2}\), we get the F statistic for our test :

\(\small{~~~~~F~=~\dfrac{S_X^2}{S_Y^2}~}\) is a F distribution F(n-1, m-1) with two degrees of freedom n-1 and m-1

We also make use of an important property of F distribution: If a avariable F follows \(\small{F(n-1,m-1)}\), then its reciprocal \(\small{\dfrac{1}{F} }\) follows \(\small{F(m-1, n-1)}\).

Therefore, \(\small{~~F~=~\dfrac{S_Y^2}{S_X^2}~}\) is a F distribution F(m-1, n-1) with degrees of freedom m-1 and n-1

The three alternate hypothesis can be tested as follows:


1. Two sided alternate hypothesis

\(~~~~~~\small{H_0:~~\sigma_X^2~=~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\ne~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_X^2}{S_Y^2} \ge F_{1 - \alpha/2}(n-1,m-1 ) }~~\) or \(~~~~~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{1-\alpha/2}(m-1,n-1 ) }\)


2. One sided alternate hypothesis

\(~~~~~~\small{H_0:~~\sigma_X^2~\le~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\gt~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_X^2}{S_Y^2} \ge F_{1-\alpha}(n-1,m-1 ) }~~\)


3. One sided alternate hypothesis

\(~~~~~~\small{H_0:~~\sigma_X^2~\ge~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\lt~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{1 - \alpha}(m-1,n-1 ) }~~\)



For all the three cases, we can also compute the p-value for the statistic \( F = \dfrac{S_X^2}{S_Y^2} \) and compare it with the significance level \(\alpha\) to reject or accept the null hypothesis.

Example-1 :

The manufacturers of two brands of an electric bulb claimed that their bulbs have a mean life time of 1000 hours. Sixteen bulbs of each brand were randomly drawn and their life time in hours were determined by experiment. The results are presented below:

Bulb-A : \(\small{1067.7, ~984.3,~ 998.8,~ 1025.9,~ 1060.9,~ 959.1,~ 1013.8,~ 1047.0,~ 987.8,~ 1051.0,}\)

\( \small{~~~~~~~~~~~~~~~~~885.2,~1049.5,~ 1098.2,~ 1001.5,~ 1011.1,~ 991.6}\)


Bulb-B : \(\small{ 957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5,}\)

\( \small{~~~~~~~~~~~~~~~~~1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1}\)


Assuming that the life times of Bulb-A and Bulb-B follow Gaussian distributions \(\small{N(\mu_x, \sigma_x )}\) and \(\small{N(\mu_y, \sigma_y )}\) respectively, test the null hypothesis \(\small{\sigma_x^2 = \sigma_y^2 }\) aginst an alternative hypothesis \(\small{\sigma_x^2 \lt \sigma_y^2 }\) to a significant level of \(\small{\alpha = 0.05}\).


In the given data, let X and Y denote data on Bulb-A and Bulb-B respectively.

From the data sets, we compute the standard deviations \(\small{S_X = 50.4 }\) and \(\small{S_Y = 69.1}\).

The number of data points in each set are n=m=16.

From standard deviations, \(\small{~~\dfrac{S_Y^2}{S_X^2}~=~\dfrac{69.1^2}{50.4^2}~=~1.879 }\)

\(\small{F_{1-\alpha}(m-1,n-1)~=~F_{0.95}(15,15)~=~2.4~~~~~~~~ }\)(through R function call "qf(0.95,15,15)" ).

While testing \(\small{\sigma_X^2 = \sigma_Y^2 }\) against an alternate hypothesis \(\small{\sigma_X^2 \lt \sigma_Y^2 }\), we reject the null if \(~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{1 - \alpha}(m-1,n-1 ) }~~\).

In our case, since \(~~\small{\dfrac{S_Y^2}{S_X^2} \lt F_{1-\alpha}(m-1,n-1 ) }~~\), we fail to reject the null hypothesis and state that \(\small{\sigma_X^2 }\) is not significantly different from \(\small{\sigma_Y^2 }\). We reject the alternate hypothesis that \(\small{\sigma_x^2 \lt \sigma_y^2 }\) to a significant level of 0.05.


Alternately, we can compute the p-value for the one sided test and compare with \(\alpha=0.05\).

Compute the p-value for \(\small{~~F = \dfrac{S_Y^2}{S_X^2}~=~\dfrac{69.1^2}{50.4^2}~=~1.879 }\) in \(\small F_{1-\alpha}(m-1,n-1 ) \)

Using the R function call 1 - pf(1.879,15,15) we get a p-value = 0.1167 for the test. Since this is larger than 0.05, we again fail to reject the null hypothesis.

R-scripts

In R, the function var.test() performs the test on the equality of two variances. This function computes the p-value for the F-test.

The important parameters of the function are as follows:


       var.test(x, y, ratio=1, alternative, conf.level) 

where

       x, y  = data vectors of two observations

       ratio  = the hypothesized ratio of the population variances of ‘x’ and 'y'.
                                     The default ration=1 tests the equality of variances.

       alternative  = A character string specifying alternate hypothesis.

			    alternate can take three values :  "two.sided", "less", "greater"

		               It can also be a vector with all the above three strings, in which
		               case all the three hypothesis will be tested.

                                "less" means the population mean of x is less than that of y.
				"greater" mens the population mean of x is greater thn that of y. 

			        default value is  "two.sided" 
                     

     conf.level = confidence level of the interval. 
                   For example,  conf.level=0.95  sets a 95% confidence interval.



########################## We call the R function var.test() as demonstrated below: x = c(1067.7, 984.3,998.8,1025.9,1060.9,959.1,1013.8,1047.0,987.8,1051.0,885.2, 1049.5,1098.2,1001.5,1011.1,991.6) y = c(957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5, 1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1) res = var.test(x,y, ratio=1, alternative = "two.sided", conf.level = 0.95) print(res)


Running the above code lines prints the following lines of results on the screen:

F test to compare two variances data: x and y F = 0.53249, num df = 15, denom df = 15, p-value = 0.2338 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1860482 1.5240278 sample estimates: ratio of variances 0.5324872
The R script below performs the same operation as the function var.test() for all the three possible null hypothesis:

################################################## ## Script for testing the ratio of two population variances. ## x and y are the data vectors ## alpha = significance level for the test two_sample_variance_test = function(x, y, alpha){ ## compute the standard deviation and sample size from the data sx = sd(x) sy = sd(y) n = length(x) m = length(y) ## Two sided test : Alternative hypothesis : population variances are not equal. print("Two sided alternate hypothesis : population variances are not equal :") print(" ") variance_ratio = sx^2/sy^2 F_critical = qf(1-alpha/2, n-1, m-1) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_x/var_y = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_x/var_y = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------------------") ## One sided test : #Alternate hypothesis : population variance of x > population variance of y print("One sided alternate hypothesis : population variance of x > population variance of y :") print(" ") variance_ratio = sx^2/sy^2 F_critical = qf(1-alpha, n-1, m-1) variance_ratio = round(variance_ratio,2) F_critical = round(F_critical,2) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_x/var_y = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_x/var_y = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------------------") ## One sided test : Alternate hypothesis : population variance of x < population variance of y print("One sided alternate hypothesis : population variance of x < population variance of y :") print(" ") variance_ratio = sy^2/sx^2 F_critical = qf(1-alpha, m-1, n-1) ## note that m-1 and n-1 have been swapped in position. variance_ratio = round(variance_ratio,2) F_critical = round(F_critical,2) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_y/var_x = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_y/var_x = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------") return(1) } ### Testing x = c(1067.7, 984.3,998.8,1025.9,1060.9,959.1,1013.8,1047.0,987.8,1051.0,885.2, 1049.5,1098.2,1001.5,1011.1,991.6) y = c(957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5, 1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1) two_sample_variance_test(x, y, 0.05) ###############------------------------------------------------


Executing the above script in R prints the following results and figures of probability distribution on the screen:

[1] "Two sided alternate hypothesis : population variances are not equal :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_x/var_y = 0.532487247172825 is less than F_critical value 2.86209253046351" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance of x > population variance of y :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_x/var_y = 0.53 is less than F_critical value 2.4" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance of x < population variance of y :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_y/var_x = 1.88 is less than F_critical value 2.4" [1] "-----------------------------------------"