

Hypothesis Testing:
Continuous Variables (2 Sample)
Practice Problems (Answers)
Homework
The use of designs that involve two samples far exceeds that of those previously discussed for two reasons:
Recall our first example of the experimental method at the very beginning of the semester involving the effects of marijuana on memory. The ability to analyze such an experiment has been one of the major goals of this course. In this experiment there were two groups and we need to be able to compare the means and see if the difference is worth paying attention to (i.e., did marijuana have an effect on the memory performance)?
Let's take a step back and review what we have covered thus far about inferential statistics. Actually it goes back a little further than that to where we learned about standard scores and the normal distribution. The key point was that area under the curve implies probability. To determine these probabilities, we computed the standard or z scores. That is:
for the population for samples the the general case
We also saw that the sampling distribution of the mean was a normal distribution with:
and respectively.
So we were able to use Z scores to determine the probability that a particular sample mean was drawn from a given population. That is:
(1 sample Z,
μ & σ
known)
Then we went on to a more realistic situation in which the population standard deviation was not known. We estimated it from the sample standard deviation. This complicated things a bit in that the shape of the sampling distribution, while still normal, differed in its kurtosis as a function of the sample size (or more accurately, the df). This family of distributions was called Student's t and the formula became:
df=N1
(1 sample t,
σ unknown)
However, as was noted earlier, rarely do we know any of the population parameters, and it is desirable to have a concurrent control. So we need another sampling distribution to help us compute the relevant probabilities.
The sampling distribution involves two means, so it is called the sampling distribution of the difference between means. Note that if the two means are the same (when there is no effect of the IV), the difference between them will be zero. So the value that we are interested here (in terms of the general formula for Z given above) is the difference between the means, that is:
The mean and standard deviation of the sampling distribution of the difference between means are given by:
and respectively.
The latter is called the standard error of the difference between means. Since the sample standard deviation is again used to estimate the population value, the sampling distribution of the difference between means will also be distributed as t (the family of normal distributions that differ in kurtosis as a function of the df). So the formula becomes:
However, the null hypothesis says that:
H_{O}: μ_{1}=μ_{2}
That is, the two means come from the same population; there is no difference between them (i.e., μ_{1}μ_{2}=0). Thus, the formula reduces to:
All we need to do now is determine the formula for the standard error. However, this formula differs depending whether we are dealing with independent or dependent groups. With the independent groups design, the subjects in each of the two groups are different and unrelated in any way. For a dependent groups design the most common type is called a within subjects or repeated measures design, because the same subjects (thus actually only one group) are tested twice.
The defining formula (when the sample sizes are equal) for the standard error of the difference between means is:
And thus the formula for the t value is:
The computational formulas (which will also handle unequal sample sizes) are given by
And:
Since two variances are used in estimating the standard error of the difference between means, the degrees of freedom will equal the sum of the degrees of freedom for each of the variance estimates, that is:
2. Formal Example  [Minitab] [Spreadsheet]
Suppose you are a researcher interested in the factors influencing paper grading by professors. You have a hunch (and/or previous research) might lead you to predict that papers that are typed are rated higher than papers that are handwritten. Research to date though, has only been correlational and thus little can be said in terms of a cause and effect relationship.
So you have 10 freshman students currently taking English as well as an introductory psychology course each write one paper. They should each provide two copies of their paper (one typed and one handwritten). Next, we enlist the aid of 20 English instructors. We randomly assign 10 instructors to each of two groups. Each instructor in one group (the control group) will grade each of the 10 papers that are hand written, while the second group (the experimental group) will grade the same papers that are typed.
In Symbols 
In Words  

H_{O} 

Typing has no effect on the grade a paper receives (as compared to a handwritten paper). 
H_{A} 

Typing influences the grade for better or worse. 
If t_{obs} ≤ 2.110 or t_{obs} ≥ 2.110, then reject H_{O}.
If t_{obs} > 2.110 and t_{obs} < 2.110, then do not reject H_{O}.
Subj. Written (1) W^{2} Typed (2) T^{2} 1 81 6561 84 7056 2 81 6561 89 7921 3 79 6241 89 7921 4 80 6400 81 6561 5 84 7056 87 7569 6 87 7569 82 6724 7 75 5625 87 7569 8 83 6889 85 7225 9 88 7744 89 7921 10 83 6889 ∑ 738 60,646 856 73,356 N 9 10 Mean 82.0 85.6
Now for the variances:
and
The inferential question is whether this difference between means is worth paying attention to. Thus, we will use a between groups t test to answer this question.
Substituting the appropriate values gives:
Notice that the formula requires the computation of the correlation between the two sets of scores. It is here that we see the potential advantage to this design. That is, the error term (the standard error of the difference between means) is decreased in direct proportion to the magnitude of this correlation, which results in a potentially more powerful or sensitive test. The disadvantage though is the loss of degrees of freedom. The N here refers to the number of pairs of scores (for an individual or matched pair of individuals). Thus, the degrees of freedom is half what we would have if we had used a between groups approach (i.e., N1 is 1/2 of N_{1}+N_{2}2). The trick is to make sure the correlation is large enough to offset the loss of df.
The formula above would be very cumbersome to use. Fortunately, there is another technique available for obtaining the t value called the Direct Difference Method. If the difference between the X and Y scores is designated as D (i.e., D=XY), then we may then we may restate the null and alternative hypotheses as:
In SymbolsH_{O} μ_{D}=0 H_{A} μ_{D}≠0 The formula which looked like:
And with:
becomes:
which, given the null (μ_{1}μ_{2}=0, thus μ_{D}=0) reduces to:
Below is the derivation of the computational formula:
where the df=N1 and N refers to the number of pairs of scores.
3. Formal Example  [Minitab]
Suppose you are interested in reactions times to different colored lights (especially green and red). We could use either:
 Repeated measures design  test each subject for a number of trials, such as GGRRGRRG, etc. Then compute the average speed to each color light for each subject.
 Matched groups design  test all subjects' reaction times to white light for a given number of trials. Using this data, create two matched groups, that is, take the two quickest subjects and randomly assign one to each of the groups. Then take the next two quickest subjects and randomly assign one of them to each of the groups, etc. Ex:
Ranked
DataRed Green 1, 2, 3, 4,
5, 6, 7, 8,
. . .2 1 3 4 6 5 8 7 . . .
Note that the number of subjects must be devisable by the number of groups.
In Symbols 
In Words  

H_{O} 
μ_{r}=μ_{g} 
There is no difference in reaction times between red and green lights. 
H_{A} 
μ_{r}≠μ_{2}_{g} 
There is a difference in reaction times between red and green lights. 
If t_{obs} ≤ 2.262 or t_{obs} ≥ 2.262, then reject H_{O}.
If t_{obs} > 2.262 and t_{obs} < 2.262, then do not reject H_{O}.
Subject
(or pair)X (red) Y (green) D D^{2} 1 18 22 4 16 2 16 20 4 16 3 23 29 6 36 4 30 35 5 25 5 32 27 5 25 6 30 29 1 1 7 31 33 2 4 8 25 29 4 16 9 27 31 4 16 10 21 24 3 9
Then, for the inferential test, we will use a within groups t test (the direct difference method) and thus we have the formula:And substituting the appropriate values gives:
One of the problems with hypothesis testing is that it is often a bit too black and white. To say that there is an effect does not tell us much about the size of the effect (how much of an effect there is). A common statistic used for this purpose is omega squared (ω^{2}). It provides an estimate of the proportion of variance with membership in two independent groups.
Let's use it to see what it can tell us about the paper grading example above. The t observed was = 2.222, so substituting that in gives:
Thus, about 17% of the variation of paper grading by professors in this study was due to whether the paper was typed or handwritten.