

Central Tendency & Variability
Practice Problems (Answers)
Homework
I. Measures of Central Tendency (or Averages)
Here, we are interested in the typical, most representative score. There are three measures of central tendency that you should be familiar with. Note that when reporting these values, one additional decimal of accuracy is given compared to what is available in the raw data (even if the additional decimal is a zero, e.g., 43.0).
Computation  Example:
X 

2 
3 
5 
10 
∑X = 20 
N=4 
Since means are typically reported with one more digit of accuracy that is present in the data, I reported the mean as 5.0 rather than just 5.
When working with grouped frequency distributions, we can use an approximation:
For example:
Interval Midpoint f Mid*f 9599 97 1 97 9094 92 3 276 8589 87 5 435 8084 82 6 492 7579 77 4 308 7074 72 3 216 6569 67 1 67 6064 62 2 124 ∑f=25=N ∑(Mid*f)=2015 When computed on the raw data, we get:
Thus the formula for computing the mean with grouped data gives us a good approximation of the actual mean. In fact, when we report the mean with one decimal more accuracy than what is in the data, the two techniques give the same result.
Properties
Xs  

1, 2, 3  2 
1, 2, 30  11 
1, 2, 300  101 
X  

2  5.0  3 
3  5.0  2 
5  5.0  0 
10  5.0  5 
∑x = 0 
X  x  x^{2}  X4  (X4)^{2} 

2  3  9  2  4 
3  2  4  1  1 
5  0  0  1  1 
10  5  25  6  36 
∑x^{2}= 38 
∑(X4)^{2} = 42 
So, 38 is less than 42. This relationship would hold with any "other value."
Computation  There are several situations possible:
Fortunately, there is a formula to take care of the more complicated situations, including computing the median for grouped frequency distributions.

= Lower exact limit of the interval containing M_{d}. 

= number of scores below L. 
= number of scores within the interval containing M_{d}.  
= the width of the interval (for ungrouped data i=1).  
= the Number of scores. 
Using our last example:
= 4.5  
= 1  
= 3  
= 1  
= 6 
Properties
Xs  M_{d}  

1, 2, 3  2  2 
1, 2, 30  11  2 
1, 2, 300  101  2 
1, 2, 3000  1001  2 
Note that the presence and direction of skew in the distribution can be determined from the mean and median. The key to understanding this is to be aware that the mean is sensitive to all scores, while the median is not. There are three rules:
Variability refers to the extent to which the scores in a distribution differ from each other. An equivalent definition (that is easier to work with mathematically) says that variability refers to the extent to which the scores in a distribution differ from their mean. If a distribution is lacking in variability, we may say that it is homogenous (note the opposite would be heterogenous). Note that when reporting these values, two additional decimals of accuracy are given compared to what is available in the raw data (even if the last decimal is a zero, e.g., 4.30). The exception is the range were no extra decimals are needed because it is a crude measure (as we will see in a moment).
We will discuss four measures of variability for now: the range, mean or average deviation, variance and standard deviation.
Distribution A has a larger range (and more variability) than Distribution B.
Because only the two extreme scores are used in computing the range, however, it is a crude measure. For example:
The problem with the MD is that due to the use of the absolute value, it is a terminal procedure. In other words, it cannot be used in further calculations (which is something that we would like to be able to do).
Since standard deviations can sometimes be very small, you may need more than the 2 additional decimals of accuracy than what is available in the original data as was suggested at the outset of this section.
Properties of the Variance & Standard Deviation:
Estimation is the goal of inferential statistics. We use sample values to estimate population values. The symbols are as follows:
Measure  Sample  Population 

Mean  
Variance  s^{2}  σ^{2} 
Standard Deviation  s  σ 
It is important that the sample values (estimators) be unbiased. An unbiased estimator of a parameter is one whose average over all possible random samples of a given size equals the value of the parameter.
While is an unbiased estimator of μ, s^{2} is not an unbiased estimator of σ^{2}.
In order to make it an unbiased estimator, we use N1 in the denominator of the formula rather than just N. Thus:
Note that this is a defining formula and, as we will see below, is not the best choice when actually doing the calculations.
Let's reconsider an example from above of two distributions (A & B):
Consider a possibility for the scores that go with these distributions:
Distribution A B Data 150 150 145 110 100 100 100 100 55 90 50 50 ∑600 600 N6 6 100 100 Range15050+1=101 15050+1=101
Notice that the central tendency and range of the two distributions are the same. That is, the mean, median, and mode all equal 100 for both distributions and the range is 101 for both distributions. However, while Distributions A and B have the same measures of central tendency and the same range, they differ in their variability. Distribution A has more of it. Let us prove this by computing the standard deviation in each case. First, for Distribution A:
A x x^{2} 150 100 50 2500 145 100 45 2025 100 100 0 0 100 100 0 0 55 100 45 2025 50 100 50 2500 ∑ 600 0 9050 N 6
Plugging the appropriate values into the defining formula gives:
Measure A
Note that calculating the variance and standard deviation in this manner requires computing the mean and subtracting it from each score. Since this is not very efficient and can be less accurate as a result of rounding error, computational formulas are typically used. They are given as follows:
and 

Redoing the computations for Distribution A in this manner gives:
A  X^{2}  

150  22500  
145  21025  
100  10000  
100  10000  
55  3025  
50  2500  
∑  600  69050 
N  6 
Then, plugging in the appropriate values into the computational formula gives:
Note that the defining and computational formulas give the same result, but the computational formula is easier to work with (and potentially more accurate due to less rounding error).
Doing the same calculations for Distribution B yields:
B  X^{2}  

150  22500  
110  12100  
100  10000  
100  10000  
90  8100  
50  2500  
∑  600  65200 
N  6 
Then, plugging in the appropriate values into the computational formula gives:
Thus, Distribution A clearly has more variability than Distribution B.