Biostatistics with R

Analysis of Variance (ANOVA)

In this section, we test several (more than two) populations for the equality of their means. For example, we compare the mean yields of different hybrid varieties of tomato. We study the effect of four different drugs on patients of a certain disease.

The observed differences between various treatment groups are caused by one or more factors. In the above two examples, the hybrid variety and drug are the factors that can vary the mean between different treatment groups.

The methodology used for this comparison is called the Analysis Of Variance (ANOVA).

In the Analysis Of Variance, the total variance in the data is resolved into components. Each one of these components of variance is contributed by a factor in the study. From this we can estimat the fractional contribution of a fator to the total variance. Under the null hypothesis of equality of population means compared, a test statistic involving the ratios of these variances is known to follow an F distribution with two degrees of freedom determined by the sample sizes.

We consider the following three categories of the Analysis Of variance: (Click the mouse on the underlined text to go to the web page for the category).


Acknowledgement : The derivations presented in the following three sections closely follow the ones from chapter 8 of the book "Probability and Statistical Inference" by Hogg and Tanis. After reading this lucidly written chapter on ANOVA, I find it very difficult to follow any other style of presenting it.


  • One factor ANOVA

  • The effect of a single factor on an outcome is considered, typically testing the equality of population means among multiple samples under the effect of a single factor. For example, comparing the effectiveness of different brands of a pesticide in eliminating a particular species of pest. Here pesticide brand is the factor.

  • Two factor ANOVA with one observation per cell

  • In this method, the effect of two factors on an outcome can be studied. As an example, consider a cliical trial in which the effects of 5 brands of medicine are tested on patients. Thus medicine is one factor. Suppose, the effects of these medicines are suspected to be different on different age groups of patients. Then, age becomes the second factor. Suppose there are $a$ categories of medicine, and $b$ categories of age groups, giving ab combinations to be tested. Each one of these $ab$ conbination is called a "cell". During the trial, one patient from each age category will be given one type of medicine and the effect (in terms of number of days it takes to cure the disease) will be noted down.This is one observation per cell. The analysis will reveal whether the effect of a medicine on patients depends on the age of the patients. If there is no age effect, a medicine can be used to treat all age groups.

  • Two factor ANOVA with multiple observations per cell

  • This analysis is very similar to the previous one, except that multiple observations are made per cell. Taking the above example, multiple patients in a age group will take a medicine, instead of one patient as in the previous case. Thus a given (age,medicine) cell will have multiple patients. This is called "more than one observation per cell". The multiple observations per cell enables the stdy of "interction", if any, between the two factors. This is explained in detail inside.

    Note : One factor ANOVA is also called as "One way ANOVA". Similarly, Two factor ANOVA is also known as "Two way ANOVA".