Contents     Contents Index APA Style Guide Dr. P's Place

Correlation

  1. Important Concepts
    1. Correlation
    2. Correlation Coefficient
    3. Scatterplot
  2. Range of a Correlation Coefficient
  3. Pearson's r
    1. Rationale for Computation
    2. Computational Formula & Example - [Minitab]
  4. Spearman's rho
    1. Example 1 - [Minitab]
    2. Example 2 - [Minitab]
  5. Important Issues With Correlation
    1. Factors Influencing It
      1. Curvolinearity
      2. Limited (Restricted & Truncated) Ranges
      3. Extreme Groups
      4. An Extreme Score
    2. Relation to Causality
    3. Some Specific Uses of it

Practice Problems (Answers)
Homework


I. Important Concepts
  1. Correlation
    Earlier in the semester we noted that scientists are interested in relationships between variables. When two variables vary together (a change in one is accompanied by a change in the other), we say they are correlated.

  2. Correlation Coefficient
    Expresses quantitatively the extent to which two variables are related. There are several. We will learn about two.

  3. Scatterplot
    A graph of a collection of pairs of scores. Example:

    Note that in scatterplots, the X and Y axes are equal in length and thus this type of graph does not obey the 3/4 high rule.


II. Range of a Correlation Coefficient

Is best illustrated with examples:

  1. Perfect positive (all points fall on a straight line)

    As the number of hours studied increased so did the grade. This is also called a "direct" relationship.

    More realistic example


  2. Perfect negative

    As the number of beers drank increased, the grade decreased. This is also called an "inverse" or "indirect" relationship.

    More realistic example


  3. No correlation

    So, basicially, there is no relationship between toe size and grade.

    More realistic example


III. Pearson's r
  1. Rationale for Computation
    We have seen that z scores provide information about the relative position of a score compared to other scores in the distribution. Pearsonís r uses this: Thus, r is the mean of the sum of the products of the z scores for the two variables. What follows is a demonstration of why this works in the case of perfect positive relationship (variables X & Y) and in the case of a perfect negative relationship (variables X & W).

    First, the perfect positive relationship between X & Y.

    If the relative position of the scores on the two variables is the same (as in the present case), then the z scores of each of the variables will be the same and ∑(ZXZY) would be equal to ∑ZX2. As we saw above, ∑ZX2 is equal to N and thus r would equal N/N or 1.

    Now for the perfect negative relationship between X & W.

    The scores again have the same relative position, but this time the relationship is indirect. In this case, ∑(ZXZW) would be equal to -N and r would be equal to -N/N or -1.

  1. Computational Formula & Example
    Since the standard score formula is cumbersome, a computational formula was developed which doesnít require the calculation of z scores for all of the scores.

    Example: Scores on 20 point math and science quizzes. [Minitab]

    First step would be to create a scatterplot:



    Since the scatterplot looks promising (suggests a strong positive relationship), create the necessary grid for the computations.


    Then perform the computations:


As was suggested by the scatterplot, there is indeed a strong positive correlation between the math and science scores.


IV. Spearmanís Rho

A variant of Pearsonís r which is used with rank data is called Spearmanís Rho (rs). This correlation coefficient is appropriate when either of the following two conditions are met:

  1. Example 1. Beauty & Sociability. [Minitab]

    Person   Beauty     Sociability  
    A 3 3
    B 1=most 2
    C 2 1=most
    D 5 4
    E 4 5
    N=5  

    First step would be to create a scatterplot.



    Since the scatterplot looks promising (suggests a strong positive relationship), create the necessary grid for the computations.

    Then perform the computations:


  2. Example 2. Beauty & Science scores. [Minitab]

    Since the science score is a ratio variable, it makes sense to rank it from low to high, that is, where low ranks represent low scores. If we are going to correlate beauty with this score, it makes sense to rerank the beauty scores so that they go from low to high as well.

    Person Beauty Beauty
    (reranked)
      Science   Science
    (ranked)
    A 3 3 11 2
    B 1=most 5=most 10 1
    C 2 4 17 5=most
    D 5 1 13 3
    E 4 2 14 4
    N=5  

    Then we would create a scatterplot of the ranked scores.


    The data do not look very promising, but let's prepare the grid for the computations anyway.

    Then perform the computations:


    So as the scatter plot indicated, there wasn't much of a correlation.

    Note: Tied ranks would get the average of the tie(s). Examples:

    Pair of tied scores:
    Person   X     Y   Y (rank)
    A 3 11 4.5
    B 1 11 4.5
    C 2 17 1
    D 5 13 3
    E 4 14 2
    N=5  
                  
    Three scores tied:
    Person   X     Y   Y (rank)
    A 3 11 4
    B 1 11 4
    C 2 11 4
    D 5 13 2
    E 4 14 1
    N=5  
     

V. Important Issues With Correlation
    1. Factors Influencing the Correlation

      These are the reasons why it is important to create a scatterplot.

      1. Curvolinearity
        A linear (or monotonic) relationship is best characterized by a straight line. Both r and rs assume this.
        Example linear relationship:

        Example of a curvilinear (or nonmonotonic) relationship:

        In general, curvilinearity in a relationship will result in an r that underestimates the true relationship.

      2. Limited (Restricted & Truncated) Ranges
        Refer to situations in which the sample is somehow limited. In both cases, it results in an underestimated r.

        Example of a Restricted Range - Foot size and age in 6 year olds:

        Example of a Truncated Range - ACT scores and GPA in college students:

      3. Extreme Groups
        Results in an overestimated r. Consider looking at the relationship of reading ability and IQ, but only in poor and excellent readers:


      4. An Extreme Score
        Also results in an overestimated r. Is more of a problem when using small sample sizes. Example:


2. Relation to Causality

3. Some Specific Uses of Correlation

    1. Determining Reliabilities
      Compare two raters (interobserver) or the same raters (intraobserver) observations of behavior to see if they agree. There is a problem like this in the homework for this section.
    2. Determining Validities
      If ACT scores are highly correlated with GPA's then we can say that ACT scores are a valid predictor of GPA's.
    3. For Prediction
      A set of procedures similar to correlation called regression is used for predicting one variable from one or more other variables.

Contents Index APA Style Guide Dr. P's Place Copyright © 1997-2016 M. Plonsky, Ph.D.
Comments? mplonsky@uwsp.edu.