I. Sampling Distribution of the Mean

As might be expected, inference with continuous variables is more complicated than with dichotomous variables. Fortunately however, the general principles are the same. Again, we will use a sampling distribution to index the probability that the observed outcome is due to chance.

1. Empirical (capable of being verified or disproved by observation or experiment)
The sampling distribution of the mean is a probability distribution of the possible values of the mean that would occur if we were to draw all possible samples of a fixed size from a given population.

To get a better feel for this notion, lets consider an empirical example (or one that could be actually performed). Let us choose 10 samples of size 4 from a population of size 20.

Population
Distribution
10 observed
Sample
Distributions
Empirical Sampling
Distribution

of the Mean
6,  2, 9, 5,
0, 1, 3, 2,
1, 1, 5, 2,
7, 7, 7, 8,
1, 1, 3, 7
1, 5, 9, 0 3.75 (& sX=4.11)
0, 3, 1, 5 2.25
5, 8, 3, 0 4.00
1, 5, 0, 7 3.25
7, 6, 1, 3 4.25
3, 2, 1, 7 3.25
2, 0, 3, 5 2.50
1, 2, 1, 1 1.25
2, 7, 1, 7 4.25
9, 7, 6, 2 6.00 Ns=4  Notes:

1. , the mean of the emperical sampling distribution of the mean was about equal to the mean of the population of from which the samples were drawn.
2. , the standard deviation of the emperical sampling distribution of the mean is smaller than the standard deviation of the population from which the samples were drawn. So the sampling distribution has less variablility than the population distribution.
3. There are three types of distributions illustrated above: population, sample, and sampling.
4. Empirical sampling distributions are only used to help students understand the concept. They are not true sampling distributions, since all possible samples are not chosen.

2. Theoretical
Are called theoretical because all possible samples (an infinite number) should be drawn. Since this is impossible, the characteristics (i.e., the mean & standard deviation) of the distribution are determined mathematically.

It turns out that (note that and are synonyms).

The standard deviation of the distribution of sample means is called the standard error of the mean (or more simply, the standard error). It measures variability in the distribution of sample means or, in other words, sampling error (the amount of error we can expect due to using a sample mean to estimate a population mean). Perhaps it is easier to think of sampling error as "chance" like we did at the beginning of the semester.

One would expect the size of the standard error to be related to the sample size, and it is.

When population values are known: Thus, as the sample size gets bigger, sampling error gets smaller.

When population values are estimated from sample values: This formula requires sx to be an unbiased estimator of σx

Computational formula for the standard error estimated from sample values: Example (using the population distribution from the empirical sampling distribution above): If we didnt know the population values, we could use the SX from the first sample. As you can see, only estimates and it does so poorly in this case (because of the small sample size).

3. Sampling Distributions & Normality
The techniques that we are discussing require that the sampling distribution (in this case the distribution of sample means) be normal in shape. This will be the case if either of the following two conditions are met.
1. The population distribution of raw scores is normal.
It is difficult to actually know this, but fortunately, many variables are.
2. The sampling distribution will approach normality as the sample size is increased.
This occurs even though the population distribution may not be normal in shape. Note, though, that the more skewed the population distribution, the larger the N (sample size) needed for the sampling distribution of the mean to be normal.

II. 1 Sample Z - Parameters Known
1. Rationale
Now that we have an understanding of sampling distributions of a continuous variable, we can go on to test a hypothesis. Recall that any normally distributed variable can be transformed into a standard normal distribution (i.e., z scores). We also saw that area under the curve implies probability. Thus, if the sampling distribution of the mean is normal we can establish the probability of obtaining a particular sample mean.

2. Formal Example - [Minitab]
Let us look at an example from your book. Animal studies suggest that the anticholinergic drug physostigmine improves memory. This could have some clinical applications in humans (e.g., senility, Alzheimers disease). Studies with humans typically report that we remember an average of seven of 15 words given an 80-minute retention interval. These studies also suggest a standard deviation for the population of two.
1. Research Question
Does physostigmine improve memory in humans?

2. Hypotheses
In Symbols In Words
HO μ=7 Physostigmine has no effect on memory.
HA μ≠7 Physostigmine has an effect on memory.
1. Assumptions
1. Population of non-drugged folks has μ=7 and σ=2 (i.e., the null).
2. Sample is randomly selected.
3. Population of non-drugged folks is normal.
Reason is so that the sampling distribution of the mean will be normal. Although a large sample size would also produce a normally shaped sampling distribution, we will rarely use large samples.

2. Decision Rules
We will use the standard normal curve (Z scores) to obtain the probabilities. Our alpha level is .05 with a two-tailed test. When we look in a Z table, we see that the critical value of Z is 1.96 (Zcrit). Thus, the shaded area is the critical region. If our observed z value falls into this area, we will reject the null hypothesis. More formally:

If Zobs ≤ -1.96 or Zobs ≥ 1.96, then reject HO.
If Zobs > -1.96 and Zobs < 1.96, then do not reject HO.

3. Computation
The computations have two goals corresponding to the descriptive and inferential statistics. Suppose we obtain the following scores for a sample of 20 subjects:

 9 8 8 9 9 7 7 8 8 10 8 10 8 10 7 9 8 8 7 9

The first step is to describe the data. The most important descriptive statistic in this case is the mean or average number of words remembered by the 20 subjects receiving the drug. The calculation reveals a mean of (∑X/N=167/20=) 8.35, which is greater the the mean of the population of 7.

The second step of the computation is to perform an inferential test to determine whether this difference between means is worth paying attention to (in other words, is the improvement in memory due to sampling error or to the drug?).

Remember: More generally: Thus, the appropriate formula would be: And substituting the values in for the standard error gives: 4. Decision
Since 3.02 (Zobs) > 1.96 (Zcrit) we reject HO and assert the alternative. Now we must go beyond this simple decision of rejecting the null or not to what it all means. In other words, we need to make a conclusion based on our decision and the particular results observed. In this case, we would conclude that the physostigmine improves memory. Notice that we have actually gone beyond the alternative hypothesis by specifying that the effect has a direction (memory was improved). We do this because the mean words remembered for the drugged group was higher than for the population.

III. Errors & the Power of a Test

As can be seen, hypothesis testing is just educated guessing. Moreover, guesses (educated or not) are sometimes wrong. Consider the possible decisions we can make:

Actual Situation HO is True Possibilities: Type I Error Correct Decision II Correct Decision I Type II Error

Let us now consider each decision in more detail.

A Type I Error is the false rejection of a true null. It has a probability of alpha (α). In other words, this error occurs as a result of the fact that we have to somehow separate probable from improbable. Correct Decision I occurs when we fail to reject a true null. It has a probability of 1-α. From a scientist's perspective this is a "boring" result. A Type II Error is the false retention of a false null. It has a probability equal to beta (β). Correct Decision II occurs when we reject a false null. The whole purpose of the experiment is to provide the occasion for this type of decision. In other words, we performed the statistical test because we expect the sample to differ. This decision has a probability of 1-β. This probability is also known as the power of the statistical test. In other words, the ability of a test to find a difference when there really is one, is power. Factors Influencing Power:

1. Alpha (α). Alpha and beta are inversely related. In other words, as one increases, the other decreases (i.e., α x β = K). Thus, all other things being equal, using an alpha of .05 will result in a more powerful test than using an alpha of .01.

2. Sample Size (N). The bigger the sample (i.e., the more work we do), the more powerful the test.

3. Type of Test. Metric tests (as compared to nonparametric tests that we discuss later in the semester) are generally more powerful due to assumptions that are more restrictive.

4. Variability. Generally speaking, variability in the sample and/or population results in a less powerful test.

5. Test Directionality. One-tailed tests have the potential to be more powerful than two-tailed tests.

6. Robustness of the Effect. Six beers are more likely to influence reaction time than one beer.

IV. 1 Sample t - Sigma Unknown

In the 1 Sample Z example, both the mean (μ) and standard deviation (σ) of the population were given. However, these parameters are rarely known. In this section, we will consider how the test is performed when σ is unknown.

1. Rationale

As we noted earlier, can be used to estimate . One complication of doing this is that the shape of the theoretical distribution of sample means will depend on the sample size. Thus, this sampling distribution is actually a family of distributions and is called Students t. To better understand the t distributions, we need to consider a new way of thinking of sample size.

The Degrees of Freedom (df) for a statistic refer to the number of calculations in its computation that are free to vary. For example, the df for the variance of a sample (Sx2) is N-1. In other words, since the sum of the deviations equals zero, N-1 of the deviations are free to vary. That is, given N-1 of the deviations, we can easily determine the final deviation because it is not free to vary. In the example below where N=5, the unknown value must be 2.

x
-2
-1
0
1
?
x=0

With the 1 sample t test, the df for t equals the df for Sx which is N-1. And Students t is a family of distributions differing in their kurtosis (or peakedness). Note that when df are infinite (i.e., the sample size is very large), the t distribution will equal the z distribution.

As for the formula, remember the z test: The formula for the t is similar. Like the z test, the critical values of t are obtained from a table. To determine the critical value of t from the table, you will need to know α, the df, and whether you are using a one- or two-tailed test. You must be conservative when using these tables. For example, if your df=45 and the table only gives values for a df of 40 and 60, then you must use the critical value given for the df of 40 (or find yourself a better table).

1. Formal Example - [Minitab]

You are interested in whether the average IQ for a group of "bad kids" (the ones that put a tack on your seat before you sit down) in a school is different from the rest of the kids in the school. The average IQ for the school as a whole is 102 with the standard deviation unavailable.

1. Research Question
Do "bad kids" have normal intelligence?

2. Hypotheses
In Symbols In Words
HO μ=102 Bad kids have normal IQs.
HA μ≠102 Bad kids do not have normal IQs.

3. Assumptions
1. Population of IQ has μ=102 (i.e., the Null).
2. Sample is randomly selected.
3. Population of IQ is normal.
Reason is so that the sampling distribution of the mean will be normal. Although a large sample size would also produce a normally shaped sampling distribution, we will rarely use large samples.

1. Decision Rules
Using alpha of .05 with a two-tailed test and N=20 (df=N-1=19), we determine from the t table that the critical value is 2.093. Thus:
If tobs ≤ -2.093 or tobs ≥ 2.093, then reject HO.
If tobs > -2.093 and tobs < 2.093, then do not reject HO.

2. Computation
The IQs for the 20 bad kids are as follows:

 Subj. X X2 1 106 11236 2 120 14400 3 118 13924 4 124 15376 5 111 12321 6 123 15129 7 88 7744 8 116 13456 9 120 14400 10 127 16129 11 97 9409 12 118 13924 13 88 7744 14 91 8281 15 110 12100 16 114 12996 17 109 11881 18 130 16900 19 92 8464 20 108 11664 ∑ 2,210 247,478 N 20 Mean 110.5

Describing the data, we see that the average IQ is 110.5 which is higher than the "normal kids". We also need the standard deviation to be able to estimate the standard error when performing the inferential test. Thus, Now we can compute the t test:  3. Decision
Since 2.90 (tobs) > 2.093 (tcrit) we reject HO and assert the alternative. In other words, we conclude that the "bad kids" are smarter than average. Notice that we have actually gone beyond the alternative hypothesis by specifying that the effect has a direction (bad kids are smarter).

V. Interval Estimation

Sometimes the t test does not give us enough information. Simply knowing that a sample mean differs from a population mean may not be enough. Suppose you are a researcher interested in self-destructiveness. You develop a scale to measure this trait. Example questions might include:

1. I like to listen to loud music.
2. I use (or have used) drugs.
3. I like to drive fast.

Next you obtain a random sample of 25 people and give them the scale. (The difficulty in obtaining a random sample might be noted.) The mean for this sample is 120 and the standard deviation is 10.

One of the things that we may want to know is what is the range of scores expected for the population. If we knew this, we would be easily able to identify the deviant scorer (possibly for a case study).

Lets say we wanted to know the expected range of scores for 95% of the population. This is termed the 95% Confidence Interval (CI) and is given by: with df=N-1,
and remembering Thus, the current example has df=N-1=25-1=24, alpha of .05 (for 95% CI), two-tailed test. From the t table, we determine that the tcrit is 2.064.

Therefore, the upper limit will be And the lower limit will be: We can now be confident that 95% of people would be expected to score between 115.87 and 124.13.    Copyright © 1997-2017 M. Plonsky, Ph.D.