Biostatistics with R

Hypothesis testing a population variance

This method can be used to test if the varince of a population is equal to a particular value. It is assumed that the data points are drawn randomly from a normal distribution. The sample variance is computed from the n data points using,
$$ \small{ s^2 = \dfrac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X})^2 } $$ From the properties of the Chi-square distribution, we know that the quantity \(\small{\dfrac{(n-1)s^2}{\sigma^2} }\) follows \(\small{\chi^2(n-1) }\), a Chi-square with (n-1) degrees of freedom.

While studying about the distribution of a single population variance, we learnt that a \(\small{100(1-\alpha)\% }\) confidence interval for the population variance \(\small{\sigma^2 }\) is given by,

\(~~~~~~~~~~~~\small{\left[\dfrac{n-1}{b}s^2,~\dfrac{(n-1)}{a}s^2 \right]}\)

where the two points \(\small{a}\) and \(\small{b }\) are chosen on a Chi-square distribution with (n-1) degrees of freedom such that the area under the curve(probability) between these two points is \(\small{1-\alpha}\).

This means,\(~~\small{b = \chi^2_{1-\alpha/2}(n-1) }~~\) and \(~~\small{a = \chi^2_{\alpha/2}(n-1)~~ }\) (ie., the area from 0 to \(\small{b}\) is \(\small{1-\alpha/2 }\) and the area from 0 to \(\small{a}\) is \(\small{\alpha/2 }\). The area in the two tail regions are equal to \(\small{\alpha/2 }\). Thus the area under \(\small{\chi^2(n-1)}\) curve between points \(\small{a}\) and \(\small{b}\) is \(\small{1-\alpha }~~\). See the figure below

With the above results, we can device a hypothesis test for a single population varince:

Using the sample variance \(\small{s^2 }\) estimated from the data, compute the chi-square test statistics for a test value \(\small{\sigma=\sigma_0 }\) to be, \(\small{~~\chi^2~=~\dfrac{(n-1)s^2}{\sigma_0^2} }\)

The two sided and one sided tests for their corresponding null ($H_0$) and alternate ($H_A$) hypothesis are devised as follows:

1. Two sided hypothesis

\(\small{~~~~~~H_0~:~\sigma^2 = \sigma_0^2 }\)
\(\small{~~~~~~H_A~:~\sigma^2 \neq \sigma_0^2 }\)

Reject the null hypothesis $H_0$ if the computed statistic \(\small{\chi^2 \lt \chi^2_{\alpha/2}(n-1) }\) or \(\small{\chi^2 \gt \chi^2_{1-\alpha/2}(n-1) }\)

For a two sided test, we can also reject null hypothesis if \(\small{\sigma_0^2 }\) is outside the \(\small{100(1-\alpha)\% }\) confidence interval \(~~\small{\left[\dfrac{n-1}{b}s^2,~\dfrac{(n-1)}{a}s^2 \right]}\) for the population standard deviation.

2. One sided hypothesis

\(\small{~~~~~~H_0~:~\sigma^2 \leq \sigma_0^2 }\)
\(\small{~~~~~~H_A~:~\sigma^2 \gt \sigma_0^2 }\)

Reject the null hypothesis $H_0$ if the computed statistic \(\small{\chi^2 \gt \chi^2_{1-\alpha}(n-1) }\)

2. One sided hypothesis

\(\small{~~~~~~H_0~:~\sigma^2 \geq \sigma_0^2 }\)
\(\small{~~~~~~H_A~:~\sigma^2 \lt \sigma_0^2 }\)

Reject the null hypothesis $H_0$ if the computed statistic \(\small{\chi^2 \lt \chi^2_{\alpha}(n-1) }\)

Example-1 :

In order to study the variation in the amount of a natural product X extracted from 1 Kg of leaves from a particular plant, extraction was done with 25 random samples of leaves weighing one Kg each. The amount of product X (in mg) extracted from the leaf samples are listed below:

425, 710, 661, 664, 732, 714, 934, 761, 744, 653, 725, 657, 421, 573, 535, 602, 537, 405, 874, 791, 721, 849, 567, 468, 975

Assuming that X follows a Gaussian \(\small{N(\mu, \sigma^2) }\), test the null hypothesis \(\small{\sigma = 140}\)
(a) against an alternative hypothesis \(\small{\sigma \ne 140}\)
(b) aginst an alternative hypothesis \(\small{\sigma \gt 140}\)

From the given data set, we compute

number of data points = n = 25,

standard deviation = \(\small{s = 154.36}\)

We compute the test statistic \(~~~\small{\chi^2 = (n-1)\dfrac{s^2}{\sigma_0^2} = (25-1) \times \dfrac{154.36^2}{140^2} = 29.17}\)

(a) For the two sided test, the critical values are calculated, using R function call qchisq(alpha/2, n-1) to be

\(\small{b = \chi^2_{1-\alpha/2}(n-1) = \chi^2_{1-0.05/2}(25-1) = 39.36}\)

\(\small{a = \chi^2_{\alpha/2}(n-1) = \chi^2_{0.05/2}(25-1) = 12.40}\)

Since the test statistic value of 29.17 is withinthe range of [a,b], null hypothesis is not rejected. We therefore conclude that the population standard deviation \(\small{\sigma = 140 }\).


(b) For the one sided test with \(\small{H_A : \sigma > 140 }\), we compute the critical value to be

\(\small{\chi^2_c = \chi^2_{1-\alpha}(n-1) = \chi^2_{1-0.05}(25-1) = 36.41 }\)

Since the test statistic value of 29.17 is not greater than the critical value 36.41, we fail to reject the null hypothesis \(\small{H_A : \sigma = 140 }\) against an alternative hypothesis \(\small{H_A : \sigma \gt 140 }\) to conclude that \(\small{\sigma \le 140 }\)

R-scripts

The R script below performs the one variable test on population variance.

################################################## ## Script for testing a single population variance. ## x is the data vector ## test_sigma = value of population vaiance for testing ## alpha = significance level for the test one_sample_variance_test = function(x, test_sigma, alpha){ ## compute the standard deviation and sample size from the data s = sd(x) n = length(x) ## compute chi-square test statistic test_statistic = (n-1)*s^2/test_sigma^2 ## Two sided test a = qchisq(alpha/2, n-1) b = qchisq(1-alpha/2, n-1) test_statistic = round(test_statistic, 2) a = round(a, 2) b = round(b,2) print("Two sided alternate hypothesis : population variance not equal to test value :") if( (test_statistic <= a) | (test_statistic >= b) ){ print("Null hypothesis rejected, alternative accepted") print(paste("chi-square test statistic = ", test_statistic, " is outside the acceptance region [ ",a, ",", b," ]")) } else { print("Null hypothesis accepted, alternative rejected") print(paste("chi-square test statistic = ", test_statistic, " is inside the acceptance region [ ",a, ",", b," ]")) } print("-----------------------------------------------------") ## One sided test : Alternate hypothesis : population variance > test_variance print("One sided alternate hypothesis : population variance > test value :") chisq_critical = qchisq(1-alpha,n-1) chisq_critical = round(chisq_critical, 2) if(test_statistic > chisq_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("chi-square test statistic = ", test_statistic, " is greater than the upper critical value ", chisq_critical)) } else { print("Null hypothesis accepted, alternative rejected ") print(paste("chi-square test statistic = ", test_statistic, " is less than the upper critical value ",chisq_critical )) } print("-----------------------------------------") ## One sided test : Alternate hypothesis : population variance < test_variance print("One sided alternate hypothesis : population variance < test value :") chisq_critical = qchisq(alpha,n-1) chisq_critical = round(chisq_critical, 2) if(test_statistic < chisq_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("chi-square test statistic = ", test_statistic, " is less than the lower critical value ", chisq_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("chi-square test statistic = ", test_statistic, " is greater than the lowercritical value ",chisq_critical)) } } ### Testing the function "one_sample_variance_test" ## this data set is randomly drawn from a Gaussian of mean=140, sd=20 x = c(142.8, 135.0, 157.5, 148.4, 135.9, 153.4, 149.0, 130.2, 156.0, 189.7, 151.6, 156.5, 123.8, 152.9, 118.4, 145.8) one_sample_variance_test(x, 29, 0.05) ###############------------------------------------------------


Executing the above script in R prints the following results and figures of probability distribution on the screen:

[1] "Two sided alternate hypothesis : population variance not equal to test value :" [1] "Null hypothesis rejected, alternative accepted" [1] "chi-square test statistic = 4.91 is outside the acceptance region [ 6.26 , 27.49 ]" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance > test value :" [1] "Null hypothesis accepted, alternative rejected " [1] "chi-square test statistic = 4.91 is less than the upper critical value 25" [1] "-----------------------------------------" [1] "One sided alternate hypothesis : population variance < test value :" [1] "Null hypothesis rejected, alternative accepted" [1] "chi-square test statistic = 4.91 is less than the lower critical value 7.26"