I. Important Concepts
1. Correlation
Earlier in the semester we noted that scientists are interested in relationships between variables. When two variables vary together (a change in one is accompanied by a change in the other), we say they are correlated.

2. Correlation Coefficient
Expresses quantitatively the extent to which two variables are related. There are several. We will learn about two.

3. Scatterplot
A graph of a collection of pairs of scores. Example: Note that in scatterplots, the X and Y axes are equal in length and thus this type of graph does not obey the 3/4 high rule.

II. Range of a Correlation Coefficient

Is best illustrated with examples:

1. Perfect positive (all points fall on a straight line) As the number of hours studied increased so did the grade. This is also called a "direct" relationship.

More realistic example 2. Perfect negative As the number of beers drank increased, the grade decreased. This is also called an "inverse" or "indirect" relationship.

More realistic example 3. No correlation So, basicially, there is no relationship between toe size and grade.

More realistic example Summary

r = ± (0 ↔ 1)
Sign Magnitude
Gives direction Gives strength

III. Pearson's r
• Termed Pearson's Product Moment Correlation Coefficient.
• Good with metric data.
• Probably the most popular correlation coefficient.
• It is required that both variables involved be normally distributed.
• Represents quantitatively the extent to which scores on two variables occupy the same relative position.
1. Rationale for Computation
We have seen that z scores provide information about the relative position of a score compared to other scores in the distribution. Pearsons r uses this: Thus, r is the mean of the sum of the products of the z scores for the two variables. What follows is a demonstration of why this works in the case of perfect positive relationship (variables X & Y) and in the case of a perfect negative relationship (variables X & W).

First, the perfect positive relationship between X & Y. X ZX ZX2   Y ZY ZXZY
3 -1.42 2.02 1 -1.42 2.02
5 -.71 .50 2 -.71 .50
7 0 0 3 0 0
9 .71 .50 4 .71 .50
11 1.42 2.02 5 1.42 2.02
σ=2.82
ZX2=5=N
σ=1.41
ZXZY=∑ZX2
μ=7 μ=3
N=5

If the relative position of the scores on the two variables is the same (as in the present case), then the z scores of each of the variables will be the same and ∑(ZXZY) would be equal to ∑ZX2. As we saw above, ∑ZX2 is equal to N and thus r would equal N/N or 1.

Now for the perfect negative relationship between X & W. X ZX ZX2   W ZW ZXZW
3 -1.42 2.02 5 1.42 -2.02
5 -.71 .50 4 .71 -.5
7 0 0 3 0 0
9 .71 .50 2 -.71 -.5
11 1.42 2.02 1 -1.42 -2.02
σ=2.82
ZX2=5=N
ZXZW=-5
μ=7
N=5

The scores again have the same relative position, but this time the relationship is indirect. In this case, ∑(ZXZW) would be equal to -N and r would be equal to -N/N or -1.

1. Computational Formula & Example
Since the standard score formula is cumbersome, a computational formula was developed which doesnt require the calculation of z scores for all of the scores. Example: Scores on 20 point math and science quizzes. [Minitab]

Person Math (X) Science (Y)
A 11 11
B 13 10
C 18 17
D 12 13
E 16 14
N=5

First step would be to create a scatterplot: Since the scatterplot looks promising (suggests a strong positive relationship), create the necessary grid for the computations.

Person Math (X) Science (Y) XY X2 Y2
A 11 11 121 121 121
B 13 10 130 169 100
C 18 17 306 324 289
D 12 13 156 144 169
E 16 14 224 256 196
N=5 ∑X=70 ∑Y=65 ∑XY=937 ∑X2=1014 ∑Y2=875

Then perform the computations:  As was suggested by the scatterplot, there is indeed a strong positive correlation between the math and science scores.

IV. Spearmans Rho

A variant of Pearsons r which is used with rank data is called Spearmans Rho (rs). This correlation coefficient is appropriate when either of the following two conditions are met:

1. One variable is an ordinal scale and the other is an ordinal scale or higher.
2. One of the distributions is markedly skewed.

In either case, both scales must be converted to ranks. And if we computed Pearson's r on the ranked data, it would give Spearman's Rho. However, for computations by hand, there is a simpler formula: Where D= Rank of X  Rank of Y (i.e., a Difference score).

1. Example 1. Beauty & Sociability. [Minitab]

Person   Beauty     Sociability
A 3 3
B 1=most 2
C 2 1=most
D 5 4
E 4 5
N=5

First step would be to create a scatterplot. Since the scatterplot looks promising (suggests a strong positive relationship), create the necessary grid for the computations.

Person   Beauty     Sociability   D D2
A 3 3 0 0
B 1=most 2 -1 1
C 2 1=most 1 1
D 5 4 1 1
E 4 5 -1 1
N=5   ∑D=0 ∑D2=4

Then perform the computations:  2. Example 2. Beauty & Science scores. [Minitab]

Since the science score is a ratio variable, it makes sense to rank it from low to high, that is, where low ranks represent low scores. If we are going to correlate beauty with this score, it makes sense to rerank the beauty scores so that they go from low to high as well.

Person Beauty Beauty
(reranked)
Science   Science
(ranked)
A 3 3 11 2
B 1=most 5=most 10 1
C 2 4 17 5=most
D 5 1 13 3
E 4 2 14 4
N=5

Then we would create a scatterplot of the ranked scores. The data do not look very promising, but let's prepare the grid for the computations anyway.

Person Beauty
(reranked)
Science
(ranked)
D D2
A 3 2 1 1
B 5=most 1 4 16
C 4 5=most -1 1
D 1 3 -2 4
E 2 4 -2 4
N=5   ∑D=0 ∑D2=26

Then perform the computations:  So as the scatter plot indicated, there wasn't much of a correlation.

Note: Tied ranks would get the average of the tie(s). Examples:

Pair of tied scores:
Person   X     Y   Y (rank)
A 3 11 4.5
B 1 11 4.5
C 2 17 1
D 5 13 3
E 4 14 2
N=5

Three scores tied:
Person   X     Y   Y (rank)
A 3 11 4
B 1 11 4
C 2 11 4
D 5 13 2
E 4 14 1
N=5

V. Important Issues With Correlation
1. Factors Influencing the Correlation

These are the reasons why it is important to create a scatterplot.

1. Curvolinearity
A linear (or monotonic) relationship is best characterized by a straight line. Both r and rs assume this.
Example linear relationship: Example of a curvilinear (or nonmonotonic) relationship: In general, curvilinearity in a relationship will result in an r that underestimates the true relationship.

2. Limited (Restricted & Truncated) Ranges
Refer to situations in which the sample is somehow limited. In both cases, it results in an underestimated r.

Example of a Restricted Range - Foot size and age in 6 year olds: Example of a Truncated Range - ACT scores and GPA in college students: 3. Extreme Groups
Results in an overestimated r. Consider looking at the relationship of reading ability and IQ, but only in poor and excellent readers: 4. An Extreme Score
Also results in an overestimated r. Is more of a problem when using small sample sizes. Example: 2. Relation to Causality

Possible causal relationships between X (television violence) and Y (violence in the real world) if they are correlated include:

Possibility Symbols Explanation Meaning
a.
X → Y X causes Y watching TV violence causes
real world violence
b.
X ← Y Y causes X real world violence is the reason
there is violence on TV
c.
X ← A → Y A causes both X & Y stress (A) causes both
real world & TV violence
etc.
B → C → X
B → Y
Etc. there are still other
complicating variables

Main point is that correlation doesnt tell us much about causality. It should be noted that inferring causality from a correlation is an error that is extremely common amoung students, journalists, and even scientists themselves..

3. Some Specific Uses of Correlation

1. Determining Reliabilities
Compare two raters (interobserver) or the same raters (intraobserver) observations of behavior to see if they agree. There is a problem like this in the homework for this section.
2. Determining Validities
If ACT scores are highly correlated with GPA's then we can say that ACT scores are a valid predictor of GPA's.
3. For Prediction
A set of procedures similar to correlation called regression is used for predicting one variable from one or more other variables.    Copyright © 1997-2016 M. Plonsky, Ph.D.