Contents     Contents Index APA Style Guide Dr. P's Place

  Central Tendency & Variability

  1. Measures of Central Tendency
    1. Mean (including weighted & trimmed)
    2. Median
    3. Mode
  2. Measures of Variability
    1. Range (including IQR & SIQR)
    2. Mean Deviation
    3. Variance
    4. Standard Deviation
  3. Estimation
  4. Overall Example - [Minitab]

Practice Problems (Answers)
Homework


In addition to describing the form or shape of a distribution, it is also necessary to describe central tendency and variability (or spread).

I. Measures of Central Tendency (or Averages)

Here, we are interested in the typical, most representative score. There are three measures of central tendency that you should be familiar with. Note that when reporting these values, one additional decimal of accuracy is given compared to what is available in the raw data (even if the additional decimal is a zero, e.g., 43.0).

  1. Mean
    It is simply the arithmetic average or sum of the scores divided by the number of them.
    It is symbolized as:

    Computation - Example:

    Properties

    1. It is sensitive to all of the scores. In other words, if one score in the distribution is changed, the mean will change too. Example:

        Xs
        1, 2, 3 2
        1, 2, 30 11
        1, 2, 300 101

    2. The sum of the deviations about the mean equals zero. A deviation is symbolized as x(little x) and refers to the difference between a score and its mean. That is:
      Thus, this second property of the mean states that:
      What follows is some sample data demonstrating this property.

        X
        2 5.0 -3
        3 5.0 -2
        5 5.0 0
        10 5.0 5
        x = 0

    3. The sum of the squared deviations about the mean is less than the sum of the squared deviations about any other value. Example (with "4" as the arbitrary "other value"):

        X x x2 X-4 (X-4)2
        2 -3 9 -2 4
        3 -2 4 -1 1
        5 0 0 1 1
        10 5 25 6 36
          x2= 38
         ∑(X-4)2 = 42

      So, 38 is less than 42. This relationship would hold with any "other value."

    Variations
  2. Median or Md
    The score that cuts the distribution into two equal halves (or the middle score in the distribution).

    Computation - There are several situations possible:

    1. An odd number of scores and no duplication near the middle, then the median is the middle score.  
      Ex: 1, 2, 2, 4, 6, 7, 7.   N=7 & Md= 4.

    2. An even number of scores and no duplication near the middle, then the median is the average of the two middle scores.  
      Ex: 2, 2, 4, 6, 7, 7.   N=6 & Md = (6+4)/2 = 5.

    3. Duplication near the middle.  
      Ex: 4, 5, 5, 5, 6, 6.   N=6 & Md = ?



    Fortunately, there is a formula to take care of the more complicated situations, including computing the median for grouped frequency distributions.

    Where:

    Using our last example:

    Properties

      1. Not sensitive to all scores.

        Xs Md
        1, 2, 3 2 2
        1, 2, 30 11 2
        1, 2, 300 101 2
        1, 2, 3000 1001 2

      2. Most useful with skewed distributions.

  3. Mode
    Is the most frequently occurring score. Note:

Note that the presence and direction of skew in the distribution can be determined from the mean and median. The key to understanding this is to be aware that the mean is sensitive to all scores, while the median is not. There are three rules:

  1. If - Md > 0 then +skew

  2. If - Md < 0 then -skew

  3. If - Md = 0 then the distribution is normal
    and all three measures of central tendency coincide.


II. Measures of Variability

Variability refers to the extent to which the scores in a distribution differ from each other. An equivalent definition (that is easier to work with mathematically) says that variability refers to the extent to which the scores in a distribution differ from their mean. If a distribution is lacking in variability, we may say that it is homogenous (note the opposite would be heterogenous). Note that when reporting these values, two additional decimals of accuracy are given compared to what is available in the raw data (even if the last decimal is a zero, e.g., 4.30). The exception is the range were no extra decimals are needed because it is a crude measure (as we will see in a moment).

We will discuss four measures of variability for now: the range, mean or average deviation, variance and standard deviation.

  1. Range
    As we noted when discussing the rules for creation of a grouped frequency distribution, the range is given by the highest score in the distribution minus the lowest score plus one.
    Example:

    Because only the two extreme scores are used in computing the range, however, it is a crude measure. For example:

    Variations
  2. Mean (or Average) Deviation
    If a deviation (x) is the difference of a score from its mean and variability is the extent to which the scores differ from their mean, then summing all the deviations and dividing by the number of them should give us a measure of variability. The problem though is that the deviations sum to zero. However, computing the absolute value of the deviations before summing them eliminates this problem. Thus, the formula for the MD is given by:

    The problem with the MD is that due to the use of the absolute value, it is a terminal procedure. In other words, it cannot be used in further calculations (which is something that we would like to be able to do).

  1. Variance
    Another solution to the problem of the deviations summing to zero is to square the deviations. That is:
    Thus another name for the Variance is the Mean of the Squared Deviations About the Mean (or more simply, the Mean of Squares (MS)). The problem with the MS is that its units are squared and thus represent space, rather than a distance on the X axis like the other measures of variability.

  2. Standard Deviation
    A simple solution to the problem of the MS representing a space is to compute its square root. That is:

    Properties of the Variance & Standard Deviation:

    1. Are always positive (or zero).
    2. Equal zero when all scores are identical (i.e., there is no variability).
    3. Like the mean, they are sensitive to all scores.
    4. The standard deviation is the preferred measure of variability for normal distributions.


III. Estimation

Estimation is the goal of inferential statistics. We use sample values to estimate population values. The symbols are as follows:

It is important that the sample values (estimators) be unbiased. An unbiased estimator of a parameter is one whose average over all possible random samples of a given size equals the value of the parameter.

While is an unbiased estimator of μ, s2 is not an unbiased estimator of σ2.

In order to make it an unbiased estimator, we use N-1 in the denominator of the formula rather than just N. Thus:

Note that this is a defining formula and, as we will see below, is not the best choice when actually doing the calculations.


IV. Overall Example - [Minitab]

Let's reconsider an example from above of two distributions (A & B):

Consider a possibility for the scores that go with these distributions:

Distribution A B
Data 150 150
145 110
100 100
100 100
55 90
50 50
600 600
N
6 6
100 100
Range
150-50+1=101 150-50+1=101

Notice that the central tendency and range of the two distributions are the same. That is, the mean, median, and mode all equal 100 for both distributions and the range is 101 for both distributions. However, while Distributions A and B have the same measures of central tendency and the same range, they differ in their variability. Distribution A has more of it. Let us prove this by computing the standard deviation in each case. First, for Distribution A:

  A x x2
150 100 50 2500
145 100 45 2025
100 100 0 0
100 100 0 0
55 100 -45 2025
50 100 -50 2500
600   0  9050
N 6      

Plugging the appropriate values into the defining formula gives:

Measure A

Note that calculating the variance and standard deviation in this manner requires computing the mean and subtracting it from each score. Since this is not very efficient and can be less accurate as a result of rounding error, computational formulas are typically used. They are given as follows:

Redoing the computations for Distribution A in this manner gives:

Then, plugging in the appropriate values into the computational formula gives:

Note that the defining and computational formulas give the same result, but the computational formula is easier to work with (and potentially more accurate due to less rounding error).

Doing the same calculations for Distribution B yields:

Then, plugging in the appropriate values into the computational formula gives:

Thus, Distribution A clearly has more variability than Distribution B.


Contents Index APA Style Guide Dr. P's Place Copyright © 1997-2016 M. Plonsky, Ph.D.
Comments? mplonsky@uwsp.edu.