1. Measures of Central Tendency
1. Mean (including weighted & trimmed)
2. Median
3. Mode
2. Measures of Variability
1. Range (including IQR & SIQR)
2. Mean Deviation
3. Variance
4. Standard Deviation
3. Estimation
4. Overall Example - [Minitab]

In addition to describing the form or shape of a distribution, it is also necessary to describe central tendency and variability (or spread).

I. Measures of Central Tendency (or Averages)

Here, we are interested in the typical, most representative score. There are three measures of central tendency that you should be familiar with. Note that when reporting these values, one additional decimal of accuracy is given compared to what is available in the raw data (even if the additional decimal is a zero, e.g., 43.0).

1. Mean
It is simply the arithmetic average or sum of the scores divided by the number of them.
It is symbolized as: (read as "X-Bar") when computed on a sample. (read as "Mew") when computed on a population.

Computation - Example:

X
2
3
5
10
∑X = 20
N=4 Since means are typically reported with one more digit of accuracy that is present in the data, I reported the mean as 5.0 rather than just 5.

When working with grouped frequency distributions, we can use an approximation: For example:

Interval Midpoint f Mid*f
95-99 97 1 97
90-94 92 3 276
85-89 87 5 435
80-84 82 6 492
75-79 77 4 308
70-74 72 3 216
65-69 67 1 67
60-64 62 2 124
∑f=25=N ∑(Mid*f)=2015 When computed on the raw data, we get: Thus the formula for computing the mean with grouped data gives us a good approximation of the actual mean. In fact, when we report the mean with one decimal more accuracy than what is in the data, the two techniques give the same result.

Properties

1. It is sensitive to all of the scores. In other words, if one score in the distribution is changed, the mean will change too. Example:

Xs 1, 2, 3 2
1, 2, 30 11
1, 2, 300 101

2. The sum of the deviations about the mean equals zero. A deviation is symbolized as x(little x) and refers to the difference between a score and its mean. That is: Thus, this second property of the mean states that: What follows is some sample data demonstrating this property.

X  2 5.0 -3
3 5.0 -2
5 5.0 0
10 5.0 5
x = 0

3. The sum of the squared deviations about the mean is less than the sum of the squared deviations about any other value. Example (with "4" as the arbitrary "other value"):

X x x2 X-4 (X-4)2
2 -3 9 -2 4
3 -2 4 -1 1
5 0 0 1 1
10 5 25 6 36
x2= 38
∑(X-4)2 = 42

So, 38 is less than 42. This relationship would hold with any "other value."

Variations
Weighted Mean
Each quantity to be averaged is assigned a weight. These weightings determine the relative importance of each quantity in the average. We will see an example in our grade postings, where the homework assignments are weighted at 20% and the exams are weighted at 80%.
Trimmed Mean
A mean that is computed on the middle 95% of the distribution. Can be a more stable estimate than the regular mean since it is less sensitive to outliers.

2. Median or Md
The score that cuts the distribution into two equal halves (or the middle score in the distribution).

Computation - There are several situations possible:

1. An odd number of scores and no duplication near the middle, then the median is the middle score.
Ex: 1, 2, 2, 4, 6, 7, 7.   N=7 & Md= 4.

2. An even number of scores and no duplication near the middle, then the median is the average of the two middle scores.
Ex: 2, 2, 4, 6, 7, 7.   N=6 & Md = (6+4)/2 = 5.

3. Duplication near the middle.
Ex: 4, 5, 5, 5, 6, 6.   N=6 & Md = ? • So the median is somewhere between 4.5 and 5.5.
• The lower exact limit of the score near the middle that is duplicated is 4.5 and we need 2 of the 3 scores in the interval with the duplication.
• Thus, Md = 4.5 + 2/3 = 4.5 +.67 = 5.2

Fortunately, there is a formula to take care of the more complicated situations, including computing the median for grouped frequency distributions. Where:
 L = Lower exact limit of the interval containing Md. nb = number of scores below L. nw = number of scores within the interval containing Md. i = the width of the interval (for ungrouped data i=1). N = the Number of scores.

Using our last example:

 L = 4.5 nb = 1 nw = 3 i = 1 N = 6 Properties

1. Not sensitive to all scores.

Xs Md
1, 2, 3 2 2
1, 2, 30 11 2
1, 2, 300 101 2
1, 2, 3000 1001 2

2. Most useful with skewed distributions.

3. Mode
Is the most frequently occurring score. Note:
• There can be more than one. Can have bi- or tri-modal distributions and then speak of major and minor modes.
• It is symbolized as Mo.
• For grouped data, we have a Modal Interval and a Crude Mo.
Considering the example discussed in the section on Frequency Distributions, the Modal Interval is 80-84 and the Crude Mo is 82.

Note that the presence and direction of skew in the distribution can be determined from the mean and median. The key to understanding this is to be aware that the mean is sensitive to all scores, while the median is not. There are three rules:

1. If - Md > 0 then +skew 2. If - Md < 0 then -skew 3. If - Md = 0 then the distribution is normal
and all three measures of central tendency coincide. II. Measures of Variability

Variability refers to the extent to which the scores in a distribution differ from each other. An equivalent definition (that is easier to work with mathematically) says that variability refers to the extent to which the scores in a distribution differ from their mean. If a distribution is lacking in variability, we may say that it is homogenous (note the opposite would be heterogenous). Note that when reporting these values, two additional decimals of accuracy are given compared to what is available in the raw data (even if the last decimal is a zero, e.g., 4.30). The exception is the range were no extra decimals are needed because it is a crude measure (as we will see in a moment).

We will discuss four measures of variability for now: the range, mean or average deviation, variance and standard deviation.

1. Range
As we noted when discussing the rules for creation of a grouped frequency distribution, the range is given by the highest score in the distribution minus the lowest score plus one.
R = XH - XL+ 1
Example: Distribution A has a larger range (and more variability) than Distribution B.

Because only the two extreme scores are used in computing the range, however, it is a crude measure. For example: The range of Distribution A and B is the same, although Distribution A has more variability.

Variations
Inter Quartile Range (or IQR)
The range computed for the middle 50% of the distribution.
Semi Inter Quartile Range (or SIQR)
Is simply one half of the IQR to make the measure a distance from the mean. This will make more sense after we cover more measures of variability. It is the preferred measure of variability for skewed data.

2. Mean (or Average) Deviation
If a deviation (x) is the difference of a score from its mean and variability is the extent to which the scores differ from their mean, then summing all the deviations and dividing by the number of them should give us a measure of variability. The problem though is that the deviations sum to zero. However, computing the absolute value of the deviations before summing them eliminates this problem. Thus, the formula for the MD is given by: The problem with the MD is that due to the use of the absolute value, it is a terminal procedure. In other words, it cannot be used in further calculations (which is something that we would like to be able to do).

1. Variance
Another solution to the problem of the deviations summing to zero is to square the deviations. That is: Thus another name for the Variance is the Mean of the Squared Deviations About the Mean (or more simply, the Mean of Squares (MS)). The problem with the MS is that its units are squared and thus represent space, rather than a distance on the X axis like the other measures of variability.

2. Standard Deviation
A simple solution to the problem of the MS representing a space is to compute its square root. That is: Since standard deviations can sometimes be very small, you may need more than the 2 additional decimals of accuracy than what is available in the original data as was suggested at the outset of this section.

Properties of the Variance & Standard Deviation:

1. Are always positive (or zero).
2. Equal zero when all scores are identical (i.e., there is no variability).
3. Like the mean, they are sensitive to all scores.
4. The standard deviation is the preferred measure of variability for normal distributions.

III. Estimation

Estimation is the goal of inferential statistics. We use sample values to estimate population values. The symbols are as follows:

Measure Sample Population
Mean  Variance s2 σ2
Standard Deviation s σ

It is important that the sample values (estimators) be unbiased. An unbiased estimator of a parameter is one whose average over all possible random samples of a given size equals the value of the parameter.

While is an unbiased estimator of μ, s2 is not an unbiased estimator of σ2.

In order to make it an unbiased estimator, we use N-1 in the denominator of the formula rather than just N. Thus: Note that this is a defining formula and, as we will see below, is not the best choice when actually doing the calculations.

IV. Overall Example - [Minitab]

Let's reconsider an example from above of two distributions (A & B): Consider a possibility for the scores that go with these distributions:

Distribution A B
Data 150 150
145 110
100 100
100 100
55 90
50 50
600 600
N
6 6 100 100
Range
150-50+1=101 150-50+1=101

Notice that the central tendency and range of the two distributions are the same. That is, the mean, median, and mode all equal 100 for both distributions and the range is 101 for both distributions. However, while Distributions A and B have the same measures of central tendency and the same range, they differ in their variability. Distribution A has more of it. Let us prove this by computing the standard deviation in each case. First, for Distribution A:

A x x2
150 100 50 2500
145 100 45 2025
100 100 0 0
100 100 0 0
55 100 -45 2025
50 100 -50 2500
600   0  9050
N 6

Plugging the appropriate values into the defining formula gives:

Measure A    Note that calculating the variance and standard deviation in this manner requires computing the mean and subtracting it from each score. Since this is not very efficient and can be less accurate as a result of rounding error, computational formulas are typically used. They are given as follows: and Redoing the computations for Distribution A in this manner gives:

A X2
150 22500
145 21025
100 10000
100 10000
55 3025
50 2500
600 69050
N 6

Then, plugging in the appropriate values into the computational formula gives: Note that the defining and computational formulas give the same result, but the computational formula is easier to work with (and potentially more accurate due to less rounding error).

Doing the same calculations for Distribution B yields:

B X2
150 22500
110 12100
100 10000
100 10000
90 8100
50 2500
600 65200
N 6

Then, plugging in the appropriate values into the computational formula gives: Thus, Distribution A clearly has more variability than Distribution B.    Copyright © 1997-2016 M. Plonsky, Ph.D.