Expected Counts
With a goodness-of-fit the expected counts are often simply the total sample size divided by the number of categories. For example if there were five categories, and 200 respondents, we would simply expect 200/5 = 40 respondents in each category. With a test of independence this is a little less obvious.
Let's take a look at the following example in Table 1.
Brand | Red | Blue | Total |
---|---|---|---|
Nike |
.. |
.. |
100 |
Adidas |
.. |
.. |
100 |
Total |
100 |
100 |
200 |
If the brand has no influence on the color (and the other way around), we would expect each cell to be the same, since the totals are the same as well, i.e. 50 in each cell:
Brand | Red | Blue | Total |
---|---|---|---|
Nike |
50 |
50 |
100 |
Adidas |
50 |
50 |
100 |
Total |
100 |
100 |
200 |
But what if we had twice as many people going for Red...
Brand | Red | Blue | Total |
---|---|---|---|
Nike |
.. |
.. |
75 |
Adidas |
.. |
.. |
75 |
Total |
100 |
50 |
150 |
Note that in table 3 we have twice as many people who opted for red, while still an equal amount for Nike and Adidas. We would then expect (if the two variables are independent):
Brand | Red | Blue | Total |
---|---|---|---|
Nike |
50 |
25 |
75 |
Adidas |
50 |
25 |
75 |
Total |
100 |
50 |
150 |
Note that the ratio of Red:Blue is the same in each row, including the total row, and the same goes for Nike:Adidas in the columns. The tricky part is when also the column totals are not evenly split, i.e. when the total of Nike and Adidas were also not the same.
Brand | Red | Blue | Total |
---|---|---|---|
Nike |
... |
... |
50 |
Adidas |
... |
... |
100 |
Total |
50 |
100 |
150 |
So we have a chance of a respondent having chosen 'Red' of 50/150. The chance of having chosen 'Nike' is also 50/150. If the two chances are independent the chance for having chosen 'Red' and 'Nike' is simply those two chances multiplied with each other: 50/150 × 50/150. We simply multiply this chance with the total number of respondents to obtain how many people we'd expect to choose Nike and Red based on the given totals, i.e. 50/150 × 50/150 × 150 = 16.67.
We can repeat the steps from the previous paragraph for each cell. For Nike-Blue we get: 50/150 × 100/150 × 150 = 33.33. For Adidas-Red we get 100/150 x 50/150 × 150 = 33.33, and for Adidas-Blue we get 100/150 × 100/150 × 150 = 66.67.
Brand | Red | Blue | Total |
---|---|---|---|
Nike |
16.67 |
33.33 |
50 |
Adidas |
33.33 |
66.67 |
100 |
Total |
75 |
75 |
150 |
These are the expected counts. We would expect these counts if the color has no influence on the brand, and the brand no influence on the color (i.e. the two are independent, there is no association).
If you look at the last calculation we did:
\(\frac{\text{Row Total}}{\text{Grand Total}}\times\frac{\text{Column Total}}{\text{Grand Total}}\times{\text{Grand Total}}\)
We can make this a little easier by writing:
\(\frac{\text{Row Total}\times\text{Column Total}}{\text{Grand Total}}\)
Yes, we could have just used the totals and subtract an already calculated cell value, but this might not work if you have a table larger than two by two.
Google adds