I. Introduction

The use of designs that involve two samples far exceeds that of those previously discussed for two reasons:

1. It is rare that μ or σ are known. When using two samples, neither of these parameters are required.
2. Since two groups (or measurements) are included, one will serve as a concurrent control. In other words, the two groups (or measurements) occur closely together in time and space. Thus, the treatment and testing circumstances (which introduce lots of potential extraneous variables) can be better controlled. For example, in terms of our IQ/"Bad Kids" example, perhaps the IQ of the population was taken 2 years previously and the IQ in that area was increasing at the rate of 2-4 points per year (for whatever reason; you can be creative here).

Recall our first example of the experimental method at the very beginning of the semester involving the effects of marijuana on memory. The ability to analyze such an experiment has been one of the major goals of this course. In this experiment there were two groups and we need to be able to compare the means and see if the difference is worth paying attention to (i.e., did marijuana have an effect on the memory performance)?

II. Brief Review & Discussion of Logic

Let's take a step back and review what we have covered thus far about inferential statistics. Actually it goes back a little further than that to where we learned about standard scores and the normal distribution. The key point was that area under the curve implies probability. To determine these probabilities, we computed the standard or z scores. That is: for the population for samples the the general case

We also saw that the sampling distribution of the mean was a normal distribution with: and respectively.

So we were able to use Z scores to determine the probability that a particular sample mean was drawn from a given population. That is: (1 sample Z, μ & σ known)

Then we went on to a more realistic situation in which the population standard deviation was not known. We estimated it from the sample standard deviation. This complicated things a bit in that the shape of the sampling distribution, while still normal, differed in its kurtosis as a function of the sample size (or more accurately, the df). This family of distributions was called Student's t and the formula became: df=N-1 (1 sample t, σ unknown)

However, as was noted earlier, rarely do we know any of the population parameters, and it is desirable to have a concurrent control. So we need another sampling distribution to help us compute the relevant probabilities.

The sampling distribution involves two means, so it is called the sampling distribution of the difference between means. Note that if the two means are the same (when there is no effect of the IV), the difference between them will be zero. So the value that we are interested here (in terms of the general formula for Z given above) is the difference between the means, that is: The mean and standard deviation of the sampling distribution of the difference between means are given by: and respectively.

The latter is called the standard error of the difference between means. Since the sample standard deviation is again used to estimate the population value, the sampling distribution of the difference between means will also be distributed as t (the family of normal distributions that differ in kurtosis as a function of the df). So the formula becomes: However, the null hypothesis says that:

HO: μ12

That is, the two means come from the same population; there is no difference between them (i.e., μ12=0). Thus, the formula reduces to: All we need to do now is determine the formula for the standard error. However, this formula differs depending whether we are dealing with independent or dependent groups. With the independent groups design, the subjects in each of the two groups are different and unrelated in any way. For a dependent groups design the most common type is called a within subjects or repeated measures design, because the same subjects (thus actually only one group) are tested twice.

III. Independent Groups
1. Formula

The defining formula (when the sample sizes are equal) for the standard error of the difference between means is: And thus the formula for the t value is: The computational formulas (which will also handle unequal sample sizes) are given by And: Since two variances are used in estimating the standard error of the difference between means, the degrees of freedom will equal the sum of the degrees of freedom for each of the variance estimates, that is: 2. Formal Example - [Minitab] [Spreadsheet]

Suppose you are a researcher interested in the factors influencing paper grading by professors. You have a hunch (and/or previous research) might lead you to predict that papers that are typed are rated higher than papers that are handwritten. Research to date though, has only been correlational and thus little can be said in terms of a cause and effect relationship.

So you have 10 freshman students currently taking English as well as an introductory psychology course each write one paper. They should each provide two copies of their paper (one typed and one handwritten). Next, we enlist the aid of 20 English instructors. We randomly assign 10 instructors to each of two groups. Each instructor in one group (the control group) will grade each of the 10 papers that are hand written, while the second group (the experimental group) will grade the same papers that are typed.

1. Research Question

2. Hypotheses

In Symbols
In Words
HO
μ12
(as compared to a handwritten paper).
HA
μ1≠μ2
Typing influences the grade for better or worse.

1. Assumptions
1. The null hypothesis.
2. Our subjects were chosen randomly from the population.
3. Sampling distribution of the difference between means is normal in shape. In other words, the DV should be normally distributed in the population.
4. The groups are independent.
5. There is homogeneity of variance. That is, the amount of variability in the DV is about equal in each of the groups. When the samples sizes are reasonably large and the number of subjects in each group is about equal, we do not have to worry about this too much because the t test is robust. This means that it is strong and can tolerate some violations of its assumptions.

2. Decision Rules
Using alpha of .05 with a two-tailed test and df=N1+N2-2=10+9-2=17, we determine from the t table that the critical value is 2.110. Thus:

If tobs ≤ -2.110 or tobs ≥ 2.110, then reject HO.
If tobs > -2.110 and tobs < 2.110, then do not reject HO.

3. Computation
Since we are not interested in the differences between the scores of the 10 papers graded by an instructor, we simply calculate the mean grade given by each instructor. Note that one of the instructors in the Written Group had to be excluded because their dog ate the papers they were supposed to grade. Thus, we then have 19 means. To describe the data, we need to compute the means and variances for each of the two groups, that is:
Written (1) W2 Typed (2) T2 ∑ Subj. 1 81 6561 84 7056 2 81 6561 89 7921 3 79 6241 89 7921 4 80 6400 81 6561 5 84 7056 87 7569 6 87 7569 82 6724 7 75 5625 87 7569 8 83 6889 85 7225 9 88 7744 89 7921 10 83 6889 738 60,646 856 73,356 N 9 10 82.0 85.6

Now for the variances: and The inferential question is whether this difference between means is worth paying attention to. Thus, we will use a between groups t test to answer this question. Substituting the appropriate values gives: 1. Decision
Since -2.222 (tobs) < -2.110 (tcrit) we reject HO and assert the alternative. In other words, we conclude that typing a paper improves the grade it receives. Notice that we have actually gone beyond the alternative hypothesis by specifying that the effect has a direction (typing is good).

IV. Dependent Groups
1. Discussion
As noted earlier, the most common type of Dependent Groups Design is also called a Within Subjects or Repeated Measures Design, because the same subjects (thus, actually only one group) are tested twice. There is another situation, though, in which this analysis is sometimes used. It is called the Matched Groups Design. In this case, there are two groups, but they are matched on some variable that is highly and positively correlated with the DV. The procedures involved in matching will be presented more clearly below in the formal example.

2. Formula
In this case, the standard error of the difference between means is given by: Notice that the formula requires the computation of the correlation between the two sets of scores. It is here that we see the potential advantage to this design. That is, the error term (the standard error of the difference between means) is decreased in direct proportion to the magnitude of this correlation, which results in a potentially more powerful or sensitive test. The disadvantage though is the loss of degrees of freedom. The N here refers to the number of pairs of scores (for an individual or matched pair of individuals). Thus, the degrees of freedom is half what we would have if we had used a between groups approach (i.e., N-1 is 1/2 of N1+N2-2). The trick is to make sure the correlation is large enough to offset the loss of df.

The formula above would be very cumbersome to use. Fortunately, there is another technique available for obtaining the t value called the Direct Difference Method. If the difference between the X and Y scores is designated as D (i.e., D=X-Y), then we may then we may restate the null and alternative hypotheses as:

In Symbols
HO
μD=0
HA
μD≠0

The formula which looked like: And with: becomes: which, given the null (μ12=0, thus μD=0) reduces to: Below is the derivation of the computational formula: where the df=N-1 and N refers to the number of pairs of scores.

3. Formal Example - [Minitab]

Suppose you are interested in reactions times to different colored lights (especially green and red). We could use either:

• Repeated measures design - test each subject for a number of trials, such as GGRRGRRG, etc. Then compute the average speed to each color light for each subject.
• Matched groups design - test all subjects' reaction times to white light for a given number of trials. Using this data, create two matched groups, that is, take the two quickest subjects and randomly assign one to each of the groups. Then take the next two quickest subjects and randomly assign one of them to each of the groups, etc. Ex:

Ranked
Data
Red  Green
1, 2, 3, 4,
5, 6, 7, 8,
. . .
2
1
3
4
6
5
8
7
. . .

Note that the number of subjects must be devisable by the number of groups.
1. Research Question
Does reaction to red and green lights differ?

2. Hypotheses

In Symbols
In Words
HO
μ12 or
μrg
There is no difference in reaction times between red and green lights.
HA
μ1≠μ2 or
μr≠μ2g
There is a difference in reaction times between red and green lights.

1. Assumptions
1. The null hypothesis.
2. Sample was randomly selected from the population.
3. The sampling distribution of the difference between means is normal in shape. In other words, the DV should be normally distributed in the population.
4. The scores of the two conditions are correlated (i.e., the groups are dependent).

2. Decision Rules
We will test 10 (or 20 if matched) subjects. Using alpha of .05 with a two-tailed test and df=N-1=9 (where N = the number of pairs of scores), we determine from the t table that the critical value is 2.262.

Thus:

If tobs ≤ -2.262 or tobs ≥ 2.262, then reject HO.
If tobs > -2.262 and tobs < 2.262, then do not reject HO.

3. Computation
First we describe the data by computing the means for each condition/group. I should emphasize that you must remember to compute these means (even though the formula for calculating the observed t does not require it) because it is the difference between these means that we are interested in. Once we have a handle on the means, we might as well compute the difference scores and their squares (since we will need them for the analysis).
Subject 1 2 3 (or pair) X (red) Y (green) D D2 18 22 -4 16 16 20 -4 16 23 29 -6 36 30 35 -5 25 32 27 5 25 30 29 1 1 31 33 -2 4 25 29 -4 16 27 31 -4 16 21 24 -3 9    Then, for the inferential test, we will use a within groups t test (the direct difference method) and thus we have the formula: And substituting the appropriate values gives: 1. Decision
Since -2.512 (tobs) < -2.262 (tcrit) we reject HO and assert the alternative. In other words, we conclude that reaction time is quicker to red as compared to green light.

V. Calculation of Effect Size

One of the problems with hypothesis testing is that it is often a bit too black and white. To say that there is an effect does not tell us much about the size of the effect (how much of an effect there is). A common statistic used for this purpose is omega squared (ω2). It provides an estimate of the proportion of variance with membership in two independent groups. Let's use it to see what it can tell us about the paper grading example above. The t observed was = -2.222, so substituting that in gives: Thus, about 17% of the variation of paper grading by professors in this study was due to whether the paper was typed or handwritten.    Copyright © 1997-2017 M. Plonsky, Ph.D.