# ANOVA

ANOVA stands for Analysis of Variance, and is a procedure that compares the variability within the given groups of data to the variability between the given groups of data. The main purpose of ANOVA is to check if there is a significant change in the mean between data sets.

## Contents |

## Introduction

The phrase **treatment** is used in reference to an individual group or data set.

The **response** for each of the treatments is the random variable being assessed (X), or the data points of the set.

For a given set of treatments and responses:

The total observations is n = (1 + 2 + ... + k)

The Anova model is the sum of the mean response for the i^{th} treatment μ_{i} and the individual error component ϵ_{ij}.

X_{ij}= μ_{i} + ϵ_{ij}

Both μ_{i} and ϵ_{ij} are normally distributed. However, X has normal distribution with mean μ_{i} and deviation σ, and ϵ has mean 0.

## Testing a Hypothesis with ANOVA

The only hypothesis test that we assess is the one which looks to prove that at least two of the means in a set of means are different. Before we can do this, we have to fill in the data in the ANOVA table below.

### ANOVA Table

The ANOVA table is useful to find the F-statistic as it simplifies the process of data gathering and calculation.

A = k – 1 (where k is the number of groups/treatments)

B = n – k (where n is the number of observations)

H = A + B

E = C / A

F = D / B

G = E/F

C, D and I are all formulas discussed below.

Total Sum of Squares (Total amount of variation in a global sample)

Treatment Sum of Squares (The variation between the group’s means)

Error Sum of Squares (The variation within the group)

The mean squared values are both unbiased estimators of the σ^{2} value, however the MS_{Tr} value will only be unbiased when all of the means are equal.

These values culminate with the calculation of the Fishers F-distribution value (F).

F = (MS_{Tr})/(MS_{Er})

This value follows a distribution F ~ F_{(k-1,n-k)}.

### Hypothesis Test

The null hypothesis (H_{0}) is that there is no change in the means for the treatments.

H_{0}: μ_{1} = μ_{2} = μ_{3} = ⋯

The alternative hypothesis is that they are not all equal.

H_{a}: Not all the means are equal.

To test whether the means are all equal we check if the F-distribution for the observed values is greater than the F-distribution value from the F-distribution chart.

F_{0} > f(df_{Tr},df_{Er};1-α)

This is the destination of ANOVA analysis, as we compare the F value for the set of data to determine if the hypothesis that all the means are equal is true or not.

We find that not all of the averages are equal if this inequality is true (that is, if the f value found through looking at the observations is greater than the f value from the table, we reject the hypothesis that all means are equal).

## ANOVA assumptions

For the given hypothesis to be valid, the central assumptions are:

- Each group the data is normally distributed
- The standard deviation is the same for all distributions
- The groups sample spaces are all independent and random

The assumption of normality and constant variance can be checked by plotting the residuals and finding a random distribution, not one which follows any pattern.

## End

This is the end of this topic. Click here to go back to the main subject page for Numerical Methods & Statistics.