Frequency Distributions
 Ungrouped
 Example
 [Minitab]
 Rules
 Review of the Mathematics Involved
 Grouped
 Example
 [Minitab]
 Rules
Practice Problems (includes Graphing from the next section) (Answers)
Homework
A frequency distribution is a procedure for describing a set of data. There are two types: ungrouped and grouped.
I. Ungrouped (also called a "Tally")
 Example
 [Minitab]
The following data represent the number of hours of TV viewing per week (X) for 20 people.
7 
5 
4 
7 
4 
6 
6 
6 
5 
4 
6 
7 
2 
7 
5 
5 
6 
2 
2 
7 
What follows is an ungrouped frequency distribution for the above data.
X 
f 
f_{r}=p 
% 
Cf 
Cp 
C% 
7 
5 
(5/20=).25 
25 
N=20 
p=1.00 
%=100 
6 
5 
.25 
25 
15 
.75 
75 
5 
4 
.20 
20 
10 
.50 
50 
4 
3 
.15 
15 
6 
.30 
30 
3 
0 
.00 
0 
3 
.15 
15 
2 (or 1.52.5) 
3 
.15 
15 
3 
.15 
15 
∑f=20=N 
∑p=1.00 
∑%=100 
How frequently the scores are distributed within the distribution is clearly shown. For example, you can quickly see that half of the people watched 6 or more hours of TV. It is shown in different ways (i.e., f, p, & %). Notice also the Cumulative frequency columns. The concept refers to the frequency of scores falling at or below the upper exact limit of a score. Lastly, note that decimal usage within each column is consistent.
 Rules for creating an ungrouped frequency distribution
 Locate the extreme values (the lowest score or X_{L} and the highest score or X_{H}) and make a column that lists all the values from X_{H} to X_{L} (Note that X_{H} is at the top of the column and X_{L}is at the bottom. This is a convention which will make some things easier later).
 Make an adjacent column labeled "frequency" (f). Count the frequency of each X and put it in the f column. This is called a tally of the scores.
It is a good idea when doing this to put a slash through the number so you know it has been counted. (If you make a mistake, you can go back and use a slash in the opposite direction.) Check that the ∑f = N and put this in a row at the bottom of the column.
 Create another column called "relative frequency" or "proportion" (f_{r }= p = f/N).
As a general rule, proportions should be expressed in hundredths.
Check that the ∑f_{r} ≈ 1 and show it at the bottom of the column.
 Create another column called "percent" (% = f_{r} * 100).
As a general rule, percents should be expressed without decimal places (or sometimes with one). Show that ∑% ≈ 100 at the bottom of the column.
 Create another column called "cumulative frequency." Remember that this column gives the frequency of scores falling at or below the upper exact limit of a score.
When creating this column, start at X_{L} and work your way up by adding all frequencies for the scores at or below the score you are interested in. Thus:
For X_{L}, the Cf = f
For X_{L+1}, the Cf = f of X_{L} + f of X_{L+1}, etc.
The Cf of X_{H} must = N.
 Create another column called "cumulative proportion" (Cp = Cf/N).
 Could create another column called "cumulative percent" (C% = Cp * 100 = Cf/N * 100).
Note that the last two columns are not created in the same manner as the cf column. They are computed on the basis of the values in the cf column.
 Review of the Mathematics Involved
 f_{r} = p = f/N.
 % = p_{} * 100.
 ∑p ≈ 1.00 and ∑% ≈ 100. "≈" rather "=" is used because of rounding error.
However, if the proportions are off by more than .02 or the percents are off by more than 1, you have too much rounding error. The solution is to increase the number of decimals used in the intermediate calculations.
 Cp = Cf/N (& the Cp for X_{H} must = 1.00, notice the "=" rather "≈").
 C% = Cp * 100 (& the C% for X_{H} must = 100, notice the "=" rather "≈").
II. Grouped Frequency Distribution
Useful for large data sets and because they make the form or shape of the distribution more obvious. A disadvantage, though, is that the scores lose their individual identity.
 Example
 [Minitab]
The following data are the expected scores on Exam 1 (N=25).
95 
88 
81 
79 
73 
92 
88 
81 
79 
72 
92 
86 
81 
77 
67 
91 
85 
80 
77 
62 
89 
84 
80 
74 
61 
A simple tally or ungrouped frequency distribution would not be very helpful
in this case. What follows is a grouped frequency distribution for
this data.
Interval
(Stated or
Apparent
limits) 
Real or
Exact
limits 
Mid point 
f 
p 
% 
Cf 
Cp 
C% 
9599 
94.599.5 
97 
1 
.04 
4 
N=25 
p=1 
%=100 
9094 
89.594.5 
92 
3 
.12 
12 
24 
.96 
96 
8589 
84.589.5 
87 
5 
.20 
20 
21 
.84 
84 
8084 
79.584.5 
82 
6 
.24 
24 
16 
.64 
64 
7579 
74.579.5 
77 
4 
.16 
16 
10 
.40 
40 
7074 
69.574.5 
72 
3 
.12 
12 
6 
.24 
24 
6569 
64.569.5 
67 
1 
.04 
4 
3 
.12 
12 
6064 
59.564.5 
62 
2 
.08 
8 
2 
.08 
8 
∑f=25 
∑p=1 
∑%=100 
Note how easy it is to see the form or shape of the distribution. For example,
it is easy to derive:
Number
Grade 
Letter
Grade 
% of
Class 
90s

A

16

80s

B

44

70s

C

28

60s

D

12

100 
However, using just the grouped frequency distribution we wouldn't be
able to tell exactly what the highest score is (i.e., the individual scores
have lost their identity).
 Rules for creating a grouped frequency distribution
 Compute the Range using the formula below.
(Consider the scores 4, 5, & 6. R = 6  4 + 1 = 3 scores.)
 The width of a group or interval (i) must be an odd
number.
If i=1, we have an ungrouped frequency distribution. Thus the smallest reasonable value for i is 3.
 Use about 10 to 20 groups or intervals.
 Make the lowest apparent limit divisible by i. This interval should contain
X_{L}.
 Use the formula below to help you determine an appropriate
number of groups or intervals to group the data into. Note that the formula is a guide and just gives an estimate (as we will see below). To work with the formula, start by using the lowest possible
value of i (i.e., 3) and divide it into R.
So, in our example R = X_{H}  X_{L} + 1 = 95  61 + 1 = 35.
We start with the lowest possible value of i
If i=3, then the # of groups = 11.67 or 12.
If i=5, then the # of groups = 7.
If i=7, then the # of groups = 5.
Note that in this case we ALWAYS rounded up. Thus, if the R=31 and we use i=3, the # groups is 10.3 and we would have to use 11 groups. We do this because there has to be a group to put each and every score into.
Since we have limited space on the screen, an i of 5 was chosen (&
thus we would have 7 groups). Note however, that since 61 is not divisible by i, the lowest
stated limit has to be 60 which explains why the # of groups actually used was 8 rather than 7.
So here we see that the formula is just a guide.
Once we decide on the lowest apparent limit and the number of groups, we can create the left most column of the grouped frequency distribution which is the scores or, more precisely, the intervals of scores. The next column should give the exact limits for these intervals. This column helps with interpreting and understanding the cumulative frequency columns. The next column should give the midpoints (which will come in handy as we will see later). The rest of the columns are created in the same manner as for the ungrouped frequency distribution.
Copyright © 19972016 M. Plonsky, Ph.D.
Comments? mplonsky@uwsp.edu.