Qualitative Variation Measures
There are quite a few measures of qualitative variation (the variation of the frequencies of the categories). A nice starting point to understand some of the differences is from Kader and Perry (2007).
The specific name of the type of measure for this qualitative variation can vary quite a lot. Some talk about dominance, differentiation, evenness, entropy, equitability, diversity, and apportionment.
I've tried to categorise the measures a bit, based on the calculations. Table 1 shows the measures discussed on this site.
nr. | group | measure | source | original type |
---|---|---|---|---|
1 | mode | Freeman Variation Ratio | (Freeman, 1965) | |
2 | mode | Berger-Parker Index | (Berger & Parker, 1970, p. 1345) | dominance |
3 | mode | Wilcox MODVR | (Wilcox, 1973, p. 7) | |
4 | mode | Wilcox RANVR | (Wilcox, 1973, p. 8) | |
5 | mean | Wilcox AVDEV | (Wilcox, 1973, p. 9) | |
6 | mean | Gibbs-Poston M4 | (Gibbs & Poston, 1975, p. 473) | differentiation |
7 | mean | Gibbs-Poston M5 | (Gibbs & Poston, 1975, p. 474) | differentiation |
8 | mean | Gibbs-Poston M6 | (Gibbs & Poston, 1975, p. 474) | differentiation |
9 | mean | Wilcox VARNC = | (Wilcox, 1973, p. 11) | |
9 | mean | Gibbs-Poston M2 = | (Gibbs & Poston, 1975, p. 472) | differentiation |
9 | mean | Smith-Wilson Evenness Index 1 | (Smith & Wilson, 1996, p. 71) | evenness |
10 | mean | Wilcox STDEV | (Wilcox, 1973, p. 14) | |
11 | entropy | Shannon-Weaver Entropy | (Shannon & Weaver, 1949, p. 20) | entropy |
12 | entropy | Rényi Entropy | (Rényi, 1961, p. 549) | entropy |
13 | entropy | Wilcox HREL = | (Wilcox, 1973, p. 16) | |
13 | entropy | Pielou J | (Pielou, 1966, p. 141) | diversity |
14 | entropy | Sheldon Index | (Sheldon, 1969, p. 467) | equitability = relative diversity |
15 | entropy | Heip Evenness | (Heip, 1974, p. 555) | evenness |
16 | evenness | Hill Diversity | (Hill, 1973, p. 428) | diversity |
17 | evenness | Hill Evenness | (Hill, 1973, p. 429) | evenness |
18 | evenness | Bulla E | (Bulla, 1994, pp. 168-169) | evenness |
19 | evenness | Bulla D | (Bulla, 1994, p. 169) | diversity |
20a | evenness | Simpson D | (Simpson, 1949, p. 688) | diversity |
20b | evenness | Simpson D biased | (Smith & Wilson, 1996, p. 71) | |
20c | evenness | Simpson D as diversity | (Wikipedia, n.d.) | |
20d | evenness | Simpson D as diversity biased = | (Berger & Parker, 1970, p. 1345) | |
20d | evenness | Gibbs-Poston M1 | (Gibbs & Poston, 1975, p. 471) | differentiation |
21 | evenness | Gibbs-Poston M3 | (Gibbs & Poston, 1975, p. 472) | differentiation |
22 | evenness | Smith-Wilson Evenness Index 2 | (Smith & Wilson, 1996, p. 71) | evenness |
23 | evenness | Smith-Wilson Evenness Index 3 | (Smith & Wilson, 1996, p. 71) | evenness |
24 | other | Wilcox MNDIF | (Wilcox, 1973, p. 9) | |
25 | other | Kaiser b | (Kaiser, 1968, p. 211) | apportionment |
Click here to see how to determine each using software
with Excel
Excel file: DI - Qualtitative Variation.xlsm.
with Python
Jupyter Notebook: DI - Qualitative Variation Measures.ipynb.
with R
Jupyter Notebook: DI - Qualitative Variation (R).ipynb.
In the sections below the above mentioned measures are briefly described and/or their formulas shown.
Mode Based Methods
Dispersion can be seen as how much variation there is, using as a norm the center. For nominal data the measure of central tendancy is the mode, and therefor some measures of qualitative variation use the mode as the starting point.
The frequency of the modal category is then useful. This is simply the maximum of the frequencies.
\(F_{\text{mode}} = \max\left(F_1, F_2, \dots, \F_k\right)\)
Freeman Variation Ratio
This is simply the proportion that does not belong to the modal category (Zedeck, 2014, p.406).
The formula used is: (Freeman, 1965, p. 41):
\(v = 1 - \frac{F_{mode}}{n}\)
This variation ratio would become 0% if all cases fitted in the modal category, and all other categories don't have any cases. A 0 (0%) would mean that all cases were in the modal category. A 1 (100%) would indicate that no cases were in the modal category. However, this seems impossible to ever occur, since the modal category is the category with the highest frequency, which is impossible to be 0, unless there are no cases at all.
Berger–Parker index
A very simplistic measure that just informs how much percentage the modal category is.
The formula used is (Berger & Parker, 1970, p. 1345):
\(BPI = \frac{F_{mode}}{n}\)
Berger and Parker refer to this as a dominance measure, to indicate how "dominant" the modal category is.
A 1 (100%) would mean that all cases were in the modal category. A 0 (0%) would indicate that no cases were in the modal category. However, this seems impossible to ever occur, since the modal category is the category with the highest frequency, which is impossible to be 0, unless there are no cases at all.
Wilcox MODVR
This looks at the difference of the frequency for each category with the modal frequency. This then gets divided by \(n\times \left(k -1\right)\) to standardize the results to 0 to 1.
It is a modification of the Freeman Variation Ratio, hence the name MODVR. Wilcox noted that the Freeman VR can never reach the maximum value of 1.
The formula used is (Wilcox, 1973, p. 7):
\(\text{MODVR} = \frac{\sum_{i=1}^k F_{mode} - F_i}{n\times \left(k - 1\right)} = \frac{k\times F_{mode}-n}{n\times \left(k - 1\right)}\)
Wilcox RANVR
Short for 'range variation ratio' this measure is very similar to Freeman's VR. Instead of looking simply at the mode, it looks at the range.
The formula used is (Wilcox, 1973, p. 8):
\(\text{RANVR} = 1 - \frac{F_{mode} - F_{min}}{F_{mode}}\)
with:
\(F_{min} = \min\left(F_1, F_2, \dots, F_k\right)\)
Mean Based Methods
The following measures use the average count to determine the variation. i.e. \(\bar{F} = \frac{\sum_{i=1}^k F_i}{k} = \frac{n}{k}\)
Wilcox AVDEV
This simply follows the mean absolute deviation analogue but then using frequencies. Again this is then standardized.
The formula used is (Wilcox, 1973, p. 9):
\(\text{AVDEV} = 1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times \frac{n}{k}\times \left(k-1\right)}= 1-\frac{k\times \sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times n \times \left(k-1\right)}\)
Gibbs-Poston M4
The formula used (Gibbs & Poston, 1975, p. 473):
\(\text{M4} = 1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times n}\)
Gibbs-Poston M5
The problem with M4 is that it can never be 0, so to adjust for this M5 could be used but is computationally then more difficult.
The formula used (Gibbs & Poston, 1975, p. 474):
\(\text{M5} = 1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times\left(n-k+1-\bar{F}\right)}\)
Gibbs-Poston M6
The formula used (Gibbs & Poston, 1975, p. 474):
\(\text{M6} = k\times\left(1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times n}\right) = k\times\text{M4}\)
Wilcox VARNC, Gibbs-Poston M2, and Smith & Wilson E1
This is similar as the variance for scale variables.
The formula used is (Wilcox, 1973, p. 11):
\(\text{VARNC} = 1-\frac{\sum_{i=1}^{k}\left(f_i-\bar{F}\right)^2}{\frac{n^2\times\left(k-1\right)}{k}} = \frac{k\times\left(n^2-\sum_{i=1}^k f_i^2\right)}{n^2\times\left(k-1\right)}\)
This is the same as Gibbs and Poston's M2. Their formula looks different but has the same result (Gibbs & Poston, 1975, p. 472)
\(\text{M2} = \frac{1-\sum_{i=1}^k p_i^2}{1-\frac{1}{k}} = \frac{\text{M1}}{1-\frac{1}{k}} = \frac{k}{k-1}\times\text{M1}\)
It is also the same as Smith and Wilson's first evenness measure.
The formula used (Smith & Wilson, 1996, p. 71):
\(E_1 = \frac{1 - D_s}{1 - \frac{1}{k}}\)
With \(D_s\) being Simpson's D, but defined as:
\(D_s = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)
Wilcox STDEV
As with the variance for scale variables, we can take the square root to obtain the standard deviation.
The formula used can be from the VARNC or the MNDIF (Wilcox, 1973, p. 14):
\(\text{STDEV} = 1-\sqrt{\frac{\sum_{i=1}^k \left(F_i-\bar{F}\right)^2}{\left(n-\bar{F}\right)^2+\left(k-1\right)\bar{F}^2}}= 1-\sqrt{\frac{\sum_{i=1}^{k-1}\sum_{j=i+1}^k \left(F_i-F_j\right)^2}{n^2\times\left(k-1\right)}}\)
Entropy Based
Shannon-Weaver Entropy
The formula used (Shannon & Weaver, 1949, p. 20):
\(H_{sw}=-\sum_{i=1}^k p_i\times\ln\left(p_i\right)\)
Rényi entropy
This is a generalisation for Shannon entropy.
The formula used is (Rényi, 1961, p. 549):
\(H_q = \frac{1}{1 - q}\times\ln\left(\sum_{i=1}^k p_i^q\right)\)
Wilcox HREL and Pielou J
This uses Shannon's entropy but divides it over the maximum possible uncertainty.
The formula used (Wilcox, 1973, p. 16):
\(\text{HREL} = \frac{-\sum_{i=1}^k p_i \times \text{log}_2 p_i}{\text{log}_2 k}\)
This is the same as Pielou J.
The formula used (Pielou, 1966, p. 141):
\(J=\frac{H_{sw}}{\ln\left(k\right)}\)
Evenness and Diversity Based
Hill Diversity
The formula used is (Hill, 1973, p. 428):
\(N_a = \begin{cases}\left(\sum_{i=1}^k p_i^a\right)^{\frac{1}{1-a}} & \text{ if } a\neq 1 \\ e^{H_{sw}} & \text{ if } =1 \end{cases}\)
Hill Eveness
The formula used is (Hill, 1973, p. 429):
\(E_{a,b} = \frac{N_a}{N_b}\)
Where \(N_a\) and \(N_b\) are Hill's diversity values for a and b.
Bulla E
Bulla's evenness measure.
The formula used is (Bulla, 1994, pp. 168-169):
\(E_b = \frac{O - \frac{1}{k} - \frac{k - 1}{n}}{1 - \frac{1}{k} - \frac{k - 1}{n}}\)
With:
\(O = \sum_{i=1}^k \min\left(p_i, \frac{1}{k}\right)\)
Bulla D
Bulla's Evenness measure converted to a diversity measure.
The formula used is (Bulla, 1994, p. 169):
\(D_b = E_b\times k\)
Where \(E_b\) is Bulla E value.
With:
\(O = \sum_{i=1}^k \min\left(p_i, \frac{1}{k}\right)\)
Simpson D (and Gibbs-Poston M1)
The formula used is based on Simpson (1949, p. 688):
\(\text{D_1} = \frac{\sum_{i=1}^k f_i\times\left(f_i-1\right)}{n\times\left(n-1\right)}\)
Another alternative is:
\(D_2 = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)
Often the result is subtracted from 1 to reverse the scale.
\(\text{D_3} = 1-\frac{\sum_{i=1}^k f_i\times\left(f_i-1\right)}{n\times\left(n-1\right)}\)
and
\(D_4 = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)
This also makes D4 the same as Gibb and Poston M1
The formula used (Gibbs & Poston, 1975, p. 471):
\(\text{M1} = 1 - \sum_{i=1}^k p_i^2\)
Gibbs-Poston M3
The formula used (Gibbs & Poston, 1975, p. 472):
\(\text{M3} = \frac{1-\sum_{i=1}^k p_i^2-p_{min}}{1-\frac{1}{k}-p_{min}}\)
With \(p_{min}\) the lowest proportion
Smith & Wilson E2
The formula used (Smith & Wilson, 1996, p. 71):
\(E_2 = \frac{\ln\left(D_s\right)}{\ln\left(k\right)}\)
With \(D_s\) being Simpson's D, but defined as:
\(D_s = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)
Smith & Wilson E3
The formula used (Smith & Wilson, 1996, p. 71):
\(E_3 = \frac{1}{D_s \times k}\)
With \(D_s\) being Simpson's D, but defined as:
\(D_s = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)
Other Methods
Wilcox MNDIF
Analog of the mean difference measure for scale variables.
The formula used is (Wilcox, 1973, p. 9):
\(\text{MNDIF} = 1-\frac{\sum_{i=1}^{k-1}\sum_{j=i+1}^k \left|F_i-F_j\right|}{n\times\left(k-1\right)}\)
Kaiser b
The formula used (Kaiser, 1968, p. 211):
\(B = 1 - \sqrt{1 - \left(\sqrt[k]{\prod_{i=1}^k\frac{f_i\times k}{n}}\right)^2}\)
Google adds