|
|
||||||
|---|---|---|---|---|---|---|
Nonparametric tests are sometimes called distribution free statistics because they do not require that the data fit a normal distribution. More generally, nonparametric tests require less restrictive assumptions about the data. Another important reason for using these tests is that they allow for the analysis of categorical as well as rank data.
It is important to note that even with metric data, if assumptions are badly violated, nonparametric tests are likely to be employed.
This statistic is used to test expected versus observed frequencies. There are two situations in which it is used.
| In words: | |
|---|---|
| HO | The observed distribution fits the expected or, in other words, there is no preference. |
| HA | The observed distribution does not fit that expected (there is a preference). |
Notice that there is no mention made of parameters.
| Ej | = the Expected frequency in the j-th column. |
|---|---|
| Oj | = the Observed frequency in the j-th column. |
| In our example, j = the number of types of movies. | |
Then:
Now let's consider the following data:
| Comedy | Horror | Drama | Sci fi | ||
| Expected |
|
|
|
|
as %s |
| Observed |
|
|
|
|
so n=100 |
|
% |
|
|
|
|
Substituting the numbers in the formula gives:
![]()
|
Table |
|
Total | ||
|
|
|
|||
| Categories of Other Drugs Tried |
|
|
|
|
|
|
|
|
|
|
|
Total |
|
|
|
|
| In words: | |
|---|---|
| HO | There is no relationship (or contingency) between the two variables, that is, they are independent. |
| HA | The two variables are related. |
Again, notice that there is no mention made of parameters.
| Ejk | = the expected frequency of the cell defined by the j-th column and the k-th row. |
|---|---|
| Ojk | = the observed frequency of the cell defined by the j-th column and the k-th row. |
| Where j = # columns and k = # rows. | |
And:
Note, a helpful check is that the sum of the expected cell frequencies is equal to N, that is:
Then:
![]()
So, let's compute the Ejks for the data above.
Contingency
TableFrequency of Marijuana Use Total < 3 times/week ³ 3 times/week Categories
of Other
Drugs Tried1-3 26 (18.59) 6 (13.40) 32 4-6 17 (24.41) 25 (17.59) 42 Total43 31 74 To be clear, E11 = (32*43)/74 = 18.59
and checking our work, 18.59 + 13.40 + 24.41 + 17.59 = 73.99 » 74.So 6/32 or about 19% of folks who had tried 1-3 other drugs smoked marijuana frequently whereas 25/42 or about 60% of folks who had tried 3-6 other drugs smoked frequently. These percentages are the relevant descriptive statistics that give us the reason for performing the chi square test.
Substituting the values in the formula gives:
![]()