I. Introduction

The ANalysis Of VAriance (or ANOVA) is a powerful and common statistical procedure in the social sciences. It can handle a variety of situations. We will talk about the case of one between groups factor here and two between groups factors in the next section.

The example that follows is based on a study by Darley and Latané (1969). The authors were interested in whether the presence of other people has an influence on whether a person will help someone in distress. In this classic study, the experimenter (a female graduate student) had the subject wait in a room with either 0, 2, or 4 confederates. The experimenter announces that the study will begin shortly and walks into an adjacent room. In a few moments the person(s) in the waiting room hear her fall and complain of ankle pain. The dependent measure is the number of seconds it takes the subject to help the experimenter.

How do we analyze this data? We could do a bunch of between groups t tests. However, this is not a good idea for three reasons.

1. The amount of computational labor increases rapidly with the number of groups in the study.

Number
Groups
Number Pairs
of Means
3 3
4 6
5 10
6 15
7 21
8 28

2. We are interested in one thing -- is the number of people present related to helping behavior? -- thus it would be nice to be able to do one test that would answer this question.

3. The type I error rate rises with the number of tests we perform.

II. Logic

The reason this analysis is called ANOVA rather than multi-group means analysis (or something like that) is because it compares group means by analyzing comparisons of variance estimates. Consider:

We draw three samples. Why might these means differ? There are two reasons:

1. Group Membership (i.e., the treatment effect or IV).
2. Differences not due to group membership (i.e., chance or sampling error).

The ANOVA is based on the fact that two independent estimates of the population variance can be obtained from the sample data. A ratio is formed for the two estimates, where:

 one is sensitive to → treatment effect & error between groups estimate and the other to → error within groups estimate

Given the null hypothesis (in this case HO: μ123), the two variance estimates should be equal. That is, since the null assumes no treatment effect, both variance estimates reflect error and their ratio will equal 1. To the extent that this ratio is larger than 1, it suggests a treatment effect (i.e., differences between the groups).

It turns out that the ratio of these two variance estimates is distributed as F when the null hypothesis is true.

Note:
1. F is an extended family of distributions, which varies as a function of a pair of degrees of freedom (one for each variance estimate).
2. F is positively skewed.
3. F ratios, like the variance estimates from which they are derived, cannot have a value less than zero.

Using the F, we can compute the probability of the obtained result occurring due to chance. If this probability is low (p ≤ α), we will reject the null hypothesis.

III. Notation (Xij)

i = any score
n = the last score (or the number of scores)

What is new here is that:

j = any group
p = the last group (or the number of groups)

Thus:

Group
1 2 J P
X11   X12   X1j   X1p
X21 X22 X2j X2p
Xi1 Xi2 Xij Xip
Xn1 Xn2 Xnj Xnp
T1 T2 Tj Tp
n1 n2 nj np

And:

IV. Terminology
Since we are talking about the analysis of the variance, let's review what we know about it.

So the variance is the mean of the squared deviations about the mean (MS) or the sum of the squared deviations about the mean (SS) divided by the degrees of freedom.

V. Partitioning the Variance
As noted above, two independent estimates of the population variance can be obtained. Expressed in terms of the Sum of Squares:

To make this more concrete, consider a data set with 3 groups and 4 subjects in each. Thus, the possible deviations for the score X13 are as follows:

As you can see, there are three deviations and:

 + = within groups between groups total deviation #1 #2 #3

To obtain the Sum of the Squared Deviations about the Mean (the SS), we can square these deviations and sum them over all the scores.

Thus we have:

Note: nj in formula for the SSBetween means do it once for each deviation.

VI. The F Test

It is simply the ratio of the two variance estimates:

As usual, the critical values are given by a table. Going into the table, one needs to know the degrees of freedom for both the between and within groups variance estimates, as well as the alpha level.

For example, if we have 3 groups and 10 subjects in each, then:

DfB DfW = p - 1 = 3 – 1 = 2 = p(n - 1) or with unequal N's: = 3 * (10-1) = 27 = N - 1 = 30 - 1 = 29

Note that the df add up to the total and with α =.05, Fcrit= 3.35

VII. Formal Example
1. Research Question
Does the presence of others influence helping behavior?

2. Hypotheses

In Symbols In Words
HO μ123 The presence of others does not influence helping.
HA Not Ho The presence of others does influence helping.

3. Assumptions
1) The null hypothesis.
2) The subjects are sampled randomly.
3) The population distribution of the DV is normal in shape.
4) The groups are independent.
5) The population variances are homogenous.

4. Decision rules
Given 3 groups with 4, 5, and 5 subjects, respectively, we have (3-1=) 2 df for the between groups variance estimate and (3+4+4=) 11 df for the within groups variance estimate. (Note that it is good to check that the df add up to the total.) Now with an α level of .05, the table shows a critical value of F is 3.98. If Fobs ≥ Fcrit, reject Ho, otherwise do not reject Ho.
5. Computation - [Minitab]

Here is the data (i.e., the number of seconds it took for folks to help):

# people present 0 2 4 25 30 32 30 33 39 20 29 35 32 40 41 36 44 107 168 191 4 5 5 26.8 33.6 38.2

When there are more than two groups, the means are harder to visualize and thus they should be plotted.

For the analysis, we will use a grid as usual for most of the calculations:

0 X2 2 X2 4 X2
25 625 30 900 32 1024
30 900 33 1089 39 1521
20 400 29 841 35 1225
32 1024 40 1600 41 1681
36 1296 44 1936
107   168   191   =466
T
4 5 5 =14
N
26.8 33.6 38.2
2949   5726   7387 =16062
II
2862.25 5644.8 7296.2 =15803.25
III

Now we need the grand totals and the three intermediate quantities:

I.
II.
III.

And now we can compute the SS's (remember to check that they add up):

 SSB= III-I= 15803.25-15511.14= 292.11 SSW= II-III= 16062-15803.25= 258.75 SST= II-I= 16062-15511.14= 550.86

Then we can create the ANOVA summary table:

Source SS df MS F p
Between
292.11
2
146.056
6.21
<.05
Within
258.75
11
23.520

Total
550.86
13

1. Decision
Since Fobs (6.21) is > Fcrit (3.98), reject Ho and conclude that the presence of other people influences helping behavior. However, since this F ratio has more than 1 df in the numerator, it is called an omnibus (or overall) F ratio. If the omnibus F ratio is significant, it demands further analysis (see post hoc comparisons below). If, on the other hand, the omnibus F ratio is not significat, then the analysis is complete.

VIII. Comparisons Among Means

In the formal example presented above, we rejected the null and asserted that the groups were drawn from different populations. But which groups are different from which? A "comparison" compares the means of two groups. There are two kinds of comparisons that we can perform: "preplanned" and "post hoc". These are outlined below. Which approach is used should be based on our goals. In reality, however, the post hoc approach is the one that is most often taken.

Preplanned Post Hoc
We have a theory (or some previous research) which suggests certain comparisons. Must have a significant omnibus F & want to pin down exactly where the differences lie.
In this case, we might not even compute the omnibus F (this approach is somewhat analogous to a one-tailed test). Are more commonly used than preplanned comparisons.

In addition, there are "simple" (involving two means) and "complex" (involving more than two means) comparisons. With three groups (Groups 1, 2 & 3), the following 6 comparisons are possible. Note that as the number of groups increases, so does the number of comparisons that are possible. Some of these can tell us about trend (a description of the form of the relationship between the IV & DV).

Simple Complex

1 vs. 2

(1 + 2) vs. 3

1 vs. 3
1 vs. (2 + 3)
2 vs. 3
(1 + 3) vs. 2

Let's consider a meaningful example of a complex comparison. While in graduate school I was involved in two studies (Riley, E. P., Plonsky, M, & Rosellini, R. A., 1982 and Plonsky, M. & Riley, E. P., 1983) where we looked at maternal consumption of ethanol on the behavior of the offspring of rats. We wanted to determine if doses which do not cause gross physical abnormalites would produce behaviorial abnormalites. Since female rats will not drink alcohol voluntarily, we employed a liquid diet (a nutritional baby formula mixed with alcohol). There were 3 groups:

1. 35% EDC - The experimental group received a diet consisting of 35% Ethanol Derived Calories.
2. 0% EDC. A control group given an isocaloric liquid diet (identical to the 35% diet except with sugar substituted for alcohol so the diets would have the same amount of calories). However, there were complications that needed to be controlled for:
• Since the 35% EDC animals often drank less than we might have desired we created pairs of 0 and 35% EDC animals such that the 0% EDC animals where a day behind their 35% EDC partner in their pregnancy. This enabled us to give the 0% EDC animal of the pair the amount of liquid diet that its 35% EDC counterpart had consumed the day previously.
• While the 35% animals spread their consumption over the course of the day, while the 0% animals consumed all of their's relatively quickly. So, we split the diet given to the 0% animals into 2 feedings (early and later in the day) to more closely mimic the patter of consumption of the 35% animals.
• In summary, the 0 and 35% animals were pair fed to control for amount of nutrition received. It controlled for the fact the the animals received a liquid diet and did not receive as much as they may have preferred.
3. LC. A control group given the standard Laboratory Chow.

In this case, we might do a simple comparison of the LC vs. 0% EDC groups on the dependent measure to see if the liquid diet influenced the DV. If there is no difference, we could then do a complex comparison that combines the two control groups and compares it to the alcohol group. That is, (LC + 0% EDC) vs. 35% EDC.

A problem with post hoc tests is that the type I error rate increases the more comparisons we perform. How to deal with this is somewhat controversial and there are a number of methods currently in use. We will consider a very simple method below.

The protected t test - [Minitab] [Spreadsheet]

It is protected for two reasons:

1. It is performed only when the omnibus F is significant. This technique is protected because it requires the omnibus F to be significant (which tells us there is at least one comparison between means that is significant). So, in other words, it is protected because we are not just shooting in the dark.
2. It uses a more stable estimate of the population variance than the t test (i.e., instead of ). The error term is based on a larger number of subjects than just those being compared.

The formula is:

Where the df's are 1 for the numerator and dfw for the denominator. Thus, the Fcrit for a comparison is different than it was for the omnibus F ratio (2, 11 df = 3.98) since we are now comparing a pair of means, and there is 2-1=1 df for the numerator.

So, for our present example the critical value of F (1, 11 df) is 4.84 (from the table) and we need to run the three comparisons:

Thus, the only comparison that is significant is that between the first and third groups. Thus, while having 4 other people present slowed helping behavior, having 2 other people did not do so significantly.

IX. Relation of F to t

Since the F test is just an extension of the t test to more than two groups, they should be related and they are.

F = t2 (and this applies to both the critical and observed values).

For example, lets say we have an experiment with two groups (7 in the first and 8 in the second), thus the critical values for df = (1, 15) with α = .05:

Fcrit (1, 15) = tcrit (15)2

Obtaining the values from the tables, we can see that this is true:

4.54 = 2.1312