I. Percentiles & Percentile Rank
1. Percentile Rank

1. Rationale
2. Consider an example where you receive the same grade on a test in two different classes. In which class did you do better?

This example should make clear that just because you got the same grade in two classes doesn't necessarily mean that you did equally well in both classes. We need a precise way to measure this. One measure often used is called the Percentile Rank (or PR). Note that the cumulative percent (C%) gives the PR or % falling at or below a given score.  However, unlike the C% which is only given for the upper exact limit of the interval, we need to be able to compute this for any score.

The percentile rank is obtained through a procedure called linear interpolation.  This is the same procedure used to generate the formula for the median (discussed earlier).  The rationale goes something like this.  (Dont worry about it too much though, because we are given a formula.)

First, we assume a rectangular distribution of the scores within an interval.  For example:

 Quantity Value i 5 Apparent Limits 30 - 34 Exact Limits 29.5 - 34.5 f 6

So, we assume each score to have a frequency of 6/5, that is: Given this assumption, consider the logic for PR calculations: So if score X is halfway through the interval, then score X must have half of the scores in that interval falling below it. For example, consider the expected scores for Exam 1 data set. Let's say we want to know the percentile rank of the score of 82, then: 1. Formula for Percentile Rank Where:

 PR = Percentile Rank. X = the score we are interested in. L = the Lower exact limit of the interval containing X. nb = number (frequency) below the lower exact limit of the interval containing X. nw = number (frequency) within the interval containing X. i = the interval width. N = the Number of scores.

Note that your book uses a slightly different formula. That is, the proportion (p) is computed and then it needs to be multiplyed by 100 to get the percentile rank. The formula given above combines these two steps.

1. Example

Consider the expected scores for Exam 1 data set.  By looking at the data, we can see that the score of 82 will have PR between 40 and 64.  In this case:

 PR = ? X = 82 estimate =40-64 L = 79.5 nb = 10 nw = 6 i = 5 N = 25

Substituting into the formula:  So 52% of folks scored at or below a score of 82. (And note that 52 is between 40 & 64 as predicted. So this is a helpful check on our work.)

3. Percentile Points

Percentile points are like pennies, deciles are like dimes, and quartiles are like quarters.  They are essentially a way of dividing up the 100% that comprises the entire group. Note that the median equals P50, D5, or Q2.

In dealing with percentile rank, we asked the question, What percent of the group scored at or below a particular score?.  We can also ask the reverse.  For example, What score did 90% of the group fall at or below?.  Actually, we have already dealt with this question, that is, the median is the score that has 50% falling at or below it.  The formula for the score at a given percentile point and the formula for the median usually use a slightly different notation.  That is, the median uses N/2, while the formula for the score at a given percentile point uses P(N) where P is the percentile expressed as a proportion.. Note that if P=.5, then P(N)=N/2.

Formula for the score at a given percentile point: where the symbols for the quantities involved are the same as for the PR formula. Lets do two examples.  We will compute the score at the first and third quartile points. Our first step is to determine the relevant interval in each case.

 RelevantQuantity Q1=P25 Q3=P75 P .25 .75 Interval 75-79 85-89 L 74.5 84.5 N 25 25 nb 6 16 nw 4 5 i 5 5

Plugging the values into the formula gives: For P25: For P75: With the Q1 and Q3, we can compute a new measure of variability called the Inter Quartile Range or IQR. Thus:

IQR = (X at Q3)  (X at Q1)

Using the example above:

IQR = 87.25 - 74.81 = 12.44

In order to turn the IQR into a distance (on the x axis) from the mean, the Semi Inter Quartile Range is sometimes computed. This makes it more similar to the standard deviation. It takes the form:

SIQR = IQR/2

Using the example above:

SIQR = (87.25-74.81)/2 = 12.44/2 = 6.22

These measures of variability are the preferred measures with skewed distributions.

II. Changing the Properties of Scales

1. Introduction

The idea of percentile rank and percentile points are helpful, however, they dont always answer the questions we have.  For example, consider the following distributions of test scores.

Class A = 60, 70, & 86
Class B = 84, 85, & 86

You received an 86 on both tests.  I bet that you would be happier with your performance in class A.  Your PR, however, would be the same in both cases.  Thus, we need additional ways of getting at the issue of relative standing.

Earlier we noted that parametric scales have a zero point or origin and by definition, these scales are measured with units.  For example, Fahrenheit (Fo) and Celsius (Co) have different origins and units.  However, since they measure the same thing, one can convert back and forth with a formula. For example, to get Fahrenheit from centigrade, we can use the following:

Fo = Co (1.8) + 32o

To go in the other direction, we solve for Co, that is:

Co = Fo (.56)  17.78o

Note:

• The formula involves two constants, that is, Fo = Co (C1) + C2
• C1 (1.8 in this case) changes the unit.  It is also called the conversion factor.
• C2 (32 in this case) changes the origin (or zero point).

Consider where 1.8 & 32 come from:

 Point on Scale Fo Co Boiling 212 100 Freezing 32 0 Difference 180 100

So 1.8 units on the Fahrenheit scale equals 1 unit on the Celsius scale. Or looking at it the other way, .56-th of a unit on the Celsius scale equals 1 unit on the Fahrenheit scale.

The moral of the story is that if we have a set of scores and we want to change the unit, we need to multiply all scores by a constant.  (Note that division can be viewed as multiplication by a reciprocal.)  In addition, if we want to change the origin, we need to add a constant to all the scores.  (Note that subtraction can be viewed as addition of a negative.)

Now let us look at what happens to the mean and standard deviation of a distribution when we change the properties of a scale.

If a constant (c) is added to all scores, then the new mean (or x-bar prime) will be equal to the old mean plus the constant.  That is: Furthermore, the variability remains unchanged.  That is: and Thus, adding a constant to all the scores simply shifts the score values.  Assume that =5 and c=3. So the new mean would be 8 (5+3) and the standard deviation would remain unchanged.

If all scores are multiplied by a constant, then the new mean will equal the old mean times the constant.  That is: Furthermore, in this case the variability is changed.  That is: and Thus, multiplying all scores by a constant will shift the score values as well as their variability.  To see this visually, consider the following example (where c=3). 1. Application

Now suppose you took two tests; Biology & Chemistry.

 Test X Biology 60 Chemistry 80

In which class did you do better?  From what we have discussed thus far, it should be apparent that:

• We do not have enough info to answer the question.
• It is possible that you did better (relative to the rest of your class) on the biology test, even though your score on the chemistry test was higher.  That is: So lets see what we can make of this when we are given the means and standard deviations.

 Test X s Biology 60 55 5 Chemistry 80 85 10

Note that this data supports the figure above.  That is, you scored above the mean on the biology test and below the mean on the chemistry test.

Based on what we just covered about changing the properties of scales, one strategy for solving this problem would be to transform the biology distribution to make its mean and standard deviation equal to that of the chemistry distribution. In other words:

 Test X s Biology 60 55 5 Chemistry 80 85 10 Transformed Biology distribution ? 85 10

So we need to determine the constants in the conversion formula and then transform the biology score. Our goal is to come up with a biology score that uses a scale with the same mean and standard deviation as the chemistry score.

Remember that C1 changes the unit and C2 changes the origin. When using the formula, be concerned first with C1 and then worry about C2. = C1* + C2 Substituting the values: 85 = C1*55 + C2 Therefore, let : C1=2 Do 1-st to fix unit (to make SDs equal) C2=-25 Do 2-nd to fix origin (to make s equal) Thus: 85 = 2*55 + -25 Which shows the constants work & now we can use them: X' = 2*X + -25 Thus: X' = 2*60 + -25 = 120 - 25 = 95

So the biology score was equivalent to a 95 on the chemistry scale. Thus, relative to the rest of the class, you did much better on the biology than on the chemistry test.

III. Standard Scores & the Normal Distribution

As we have now seen, if we want to compare the scores from two distributions, we can do it in either of two ways:

1. Compute the percentile ranks (but we saw that this can have its problems).
2. Transform the scores of one distribution such that the means and standard deviations are equal in the two distributions.

A logical extension of the second procedure allows us to change all the scores to a standard scale.

1. Standard Scores

A. Theory

In this standard distribution (or one that employs a standard scale), it would be useful to have:

• A mean equal to zero. This would allow us to tell if a score is greater or less than the mean by its sign.
• A standard deviation of one. This would allow us to tell how much a score deviates from its mean by its magnitude.

Let z equal the standard score. It will tell us how many standard deviations a score differs from its mean. Then the formula for z would be: So a z score is a score minus its mean divided by the standard deviation. If we do this for all the scores (letting equal the new mean), then: and so and we have the mean of zero that we wanted.

If we let equal the new standard deviation, then: and so So Z scores are in SD units (i.e., a distance on the x axis).  They tell us how much a score deviates from its mean.

B. Application

Lets redo the earlier chemistry versus biology test example to see how much easier it is using this strategy.

 Class X SD Bio 60 55 5 Chem 80 85 10 Now, we can quickly see that while we scored a standard deviation above the mean in the biology course, we scored a half of a standard deviation below the mean in the chemistry class.

Note that this technique is a lot easier to do than manually transforming the scores of one distribution to those of another.

1. The Normal Distribution

Since standard scores are most often used with normal distributions, we need to learn a little bit more about these distributions.

As we have already discussed, the normal distribution is a theoretical distribution (meaning it doesnt really exist). A number of human behavioral characteristics fit this distribution (e.g., IQ, anxiety level, drug responsiveness, etc.). There are actually several types of normal distributions differing in their kurtosis (or peakedness). Finally, all normal distributions have three properties in common.

1. The three measures of central tendency (mean, median, & mode) all coincide.
2. They are bilaterally symmetrical.
3. The tails are asymptotic to the x axis, meaning they come closer and closer but never actually touch it. More formally, let equal infinity. Then the tails go from .

IV. The Standard Normal Distribution

If we take a normal distribution and transform all of the scores to z scores, we have a standard normal distribution.  In this case, we are dealing with theoretical (population) values, so the formula is: This distribution has a very special characteristic, that is, we can compute the proportion of area under different portions of the curve.  The figure below shows this in more detail.  Note that the values are obtained using integral calculus, but are provided in most statistics books in the form of a table to save folks from having to perform these complex calculations. Essentially the same info in a tabular view:

 Distance from mean (in SDs) % of cases 1 34.13 2 47.72 3 49.87

Applying this to IQ scores gives: Thus, in a normal distribution, virtually all of the scores fall within three standard deviations from the mean.  More detailed figures can be obtained from the z table in the back of your book. Note that only positive values are given since the curve is bilaterally symmetrical.

What follows is a very small portion of a typical z table. (Note: An online version of the z table is available.)

 Z  Z values listed to two decimal places Shows area between the mean & the z Shows area beyond the z . . . 1.00 .3413 .1587 1.01 .3438 .1562 . . . 1.48 .4306 .0649 1.49 .4319 .0681 1.50 .4332 .0668 1.51 .4345 .0655 1.52 .4357 .0643 . . .
1. Simple Applications

The z table can help us answer a lot of questions fairly quickly.  However, it is best to draw diagrams when trying to answer these questions.  Consider three examples:

1. What percent of the distribution scored between the mean and a z of 1.5? 2. What percentage scored above a z score of 1.5? 3. What percentage scored below a z score of -1.5? 1. Parameters & the Standard Normal Distribution

Before showing additional examples of the application of the standard normal curve, we need to talk a little bit more about parameters and statistics. As we noted earlier, since we are dealing with theoretical (population) values, the z score formula is: However, sample estimates can be used if both of the following conditions are met:

1. The population from which the sample was drawn is normal in shape.
2. The sample must be reasonably large.

Thus, we are back to: 1. More Complex Applications

Lets look at four examples that are representative of the different types of applications.

1. What is the percent of scores falling between a z of 1.5 and a z of 2?

First make a diagram of what is being asked.  Then use it to develop a strategy to answer the question.  Here is the diagram. C is what the question is asking and we can obtain A and B from the z tables. C=A+B.

 A 0.4772 +B 0.4332 =C 0.9104

Thus, the answer is that 91% of the scores fall between a z of 1.5 and a z of 2.

1. What percent of the population has an IQ falling between 110 and 120. (Note that IQ is distributed normally and has μ = 100 & σ = 15.)  In a group of 50 folks chosen at random, how many can we expect to have an IQ between 110 and 120?

Again, make a diagram of what is being asked.  Then use it to develop a strategy to answer the question.  Here is the diagram. C is what the question is asking and we can obtain A and B from the z tables.  C=A-B.

This time, though, we need to compute the z scores before we can look up the appropriate values in the z tables.  And now we can obtain the proportions under the curve from the z tables.

 A 0.4082 -B 0.2486 =C 0.1596

Thus, the answer is that 16% of people would be expected to have an IQ between 110 and 120. Furthermore, 8 people (.1596 * 50 = approximately 8) out of a randomly selected 50 would be expected to have an IQ in that range.

1. What is the Percentile Rank of a score of 155 on the Authoritarian scale (assume μ = 150 & σ = 20)?

Again, make a diagram of what is being asked.  Then use it to develop a strategy to answer the question.  Here is the diagram. C is what the question is asking and we can obtain A and B from the z tables. C=A+B.

We will need to compute the z of 155. Note that the z of the mean/median (150) is 0 and thus half of the scores fall below it. And now we can obtain the proportions under the curve from the z tables.

 A 0.5 +B 0.0987 =C 0.5987

There for, the PR of 155 is 60.  Sixty percent of the distribution falls at or below a score of 155.

1. In the distribution described in the problem above, what is the score at P90 (i.e., the score that has 90% of the distribution falling below it).

Again, make a diagram of what is being asked.  Then use it to develop a strategy to answer the question.  Here is the diagram. In these case we are given the proportion (i.e., .9) and we will need to find the z value associated with this proportion in the table and then use it to compute the score.  From the tables, we see that a z value of 1.28 has 40% of the distribution between it and the mean. So the score that has 90% of the distribution falling below it is 175.6.  To check this, you can solve it in reverse, that is, find the PR of 175.6.

If you are going to do this type of problem a bunch of times, it is easiest to just derive the formula. That is:     Copyright © 1997-2016 M. Plonsky, Ph.D.