Qualitative Variation Measures

There are quite a few measures of qualitative variation (the variation of the frequencies of the categories). A nice starting point to understand some of the differences is from Kader and Perry (2007).

The specific name of the type of measure for this qualitative variation can vary quite a lot. Some talk about dominance, differentiation, evenness, entropy, equitability, diversity, and apportionment.

I've tried to categorise the measures a bit, based on the calculations. Table 1 shows the measures discussed on this site.

Table 1
Measures of Qualitative Variation
nr.	group	measure	source	original type
1	mode	Freeman Variation Ratio	(Freeman, 1965)
2	mode	Berger-Parker Index	(Berger & Parker, 1970, p. 1345)	dominance
3	mode	Wilcox MODVR	(Wilcox, 1973, p. 7)
4	mode	Wilcox RANVR	(Wilcox, 1973, p. 8)
5	mean	Wilcox AVDEV	(Wilcox, 1973, p. 9)
6	mean	Gibbs-Poston M4	(Gibbs & Poston, 1975, p. 473)	differentiation
7	mean	Gibbs-Poston M5	(Gibbs & Poston, 1975, p. 474)	differentiation
8	mean	Gibbs-Poston M6	(Gibbs & Poston, 1975, p. 474)	differentiation
9	mean	Wilcox VARNC =	(Wilcox, 1973, p. 11)
9	mean	Gibbs-Poston M2 =	(Gibbs & Poston, 1975, p. 472)	differentiation
9	mean	Smith-Wilson Evenness Index 1	(Smith & Wilson, 1996, p. 71)	evenness
10	mean	Wilcox STDEV	(Wilcox, 1973, p. 14)
11	entropy	Shannon-Weaver Entropy	(Shannon & Weaver, 1949, p. 20)	entropy
12	entropy	Rényi Entropy	(Rényi, 1961, p. 549)	entropy
13	entropy	Wilcox HREL =	(Wilcox, 1973, p. 16)
13	entropy	Pielou J	(Pielou, 1966, p. 141)	diversity
14	entropy	Sheldon Index	(Sheldon, 1969, p. 467)	equitability = relative diversity
15	entropy	Heip Evenness	(Heip, 1974, p. 555)	evenness
16	evenness	Hill Diversity	(Hill, 1973, p. 428)	diversity
17	evenness	Hill Evenness	(Hill, 1973, p. 429)	evenness
18	evenness	Bulla E	(Bulla, 1994, pp. 168-169)	evenness
19	evenness	Bulla D	(Bulla, 1994, p. 169)	diversity
20a	evenness	Simpson D	(Simpson, 1949, p. 688)	diversity
20b	evenness	Simpson D biased	(Smith & Wilson, 1996, p. 71)
20c	evenness	Simpson D as diversity	(Wikipedia, n.d.)
20d	evenness	Simpson D as diversity biased =	(Berger & Parker, 1970, p. 1345)
20d	evenness	Gibbs-Poston M1	(Gibbs & Poston, 1975, p. 471)	differentiation
21	evenness	Gibbs-Poston M3	(Gibbs & Poston, 1975, p. 472)	differentiation
22	evenness	Smith-Wilson Evenness Index 2	(Smith & Wilson, 1996, p. 71)	evenness
23	evenness	Smith-Wilson Evenness Index 3	(Smith & Wilson, 1996, p. 71)	evenness
24	other	Wilcox MNDIF	(Wilcox, 1973, p. 9)
25	other	Kaiser b	(Kaiser, 1968, p. 211)	apportionment

Click here to see how to determine each using software

with Excel

Excel file: DI - Qualtitative Variation.xlsm.

with Python

Jupyter Notebook: DI - Qualitative Variation Measures.ipynb.

with R

Jupyter Notebook: DI - Qualitative Variation (R).ipynb.

In the sections below the above mentioned measures are briefly described and/or their formulas shown.

Mode Based Methods

Dispersion can be seen as how much variation there is, using as a norm the center. For nominal data the measure of central tendancy is the mode, and therefor some measures of qualitative variation use the mode as the starting point.

The frequency of the modal category is then useful. This is simply the maximum of the frequencies.

\(F_{\text{mode}} = \max\left(F_1, F_2, \dots, \F_k\right)\)

Freeman Variation Ratio

This is simply the proportion that does not belong to the modal category (Zedeck, 2014, p.406).

The formula used is: (Freeman, 1965, p. 41):

\(v = 1 - \frac{F_{mode}}{n}\)

This variation ratio would become 0% if all cases fitted in the modal category, and all other categories don't have any cases. A 0 (0%) would mean that all cases were in the modal category. A 1 (100%) would indicate that no cases were in the modal category. However, this seems impossible to ever occur, since the modal category is the category with the highest frequency, which is impossible to be 0, unless there are no cases at all.

Berger–Parker index

A very simplistic measure that just informs how much percentage the modal category is.

The formula used is (Berger & Parker, 1970, p. 1345):

\(BPI = \frac{F_{mode}}{n}\)

Berger and Parker refer to this as a dominance measure, to indicate how "dominant" the modal category is.

A 1 (100%) would mean that all cases were in the modal category. A 0 (0%) would indicate that no cases were in the modal category. However, this seems impossible to ever occur, since the modal category is the category with the highest frequency, which is impossible to be 0, unless there are no cases at all.

Wilcox MODVR

This looks at the difference of the frequency for each category with the modal frequency. This then gets divided by \(n\times \left(k -1\right)\) to standardize the results to 0 to 1.

It is a modification of the Freeman Variation Ratio, hence the name MODVR. Wilcox noted that the Freeman VR can never reach the maximum value of 1.

The formula used is (Wilcox, 1973, p. 7):

\(\text{MODVR} = \frac{\sum_{i=1}^k F_{mode} - F_i}{n\times \left(k - 1\right)} = \frac{k\times F_{mode}-n}{n\times \left(k - 1\right)}\)

Wilcox RANVR

Short for 'range variation ratio' this measure is very similar to Freeman's VR. Instead of looking simply at the mode, it looks at the range.

The formula used is (Wilcox, 1973, p. 8):

\(\text{RANVR} = 1 - \frac{F_{mode} - F_{min}}{F_{mode}}\)

with:

\(F_{min} = \min\left(F_1, F_2, \dots, F_k\right)\)

Mean Based Methods

The following measures use the average count to determine the variation. i.e. \(\bar{F} = \frac{\sum_{i=1}^k F_i}{k} = \frac{n}{k}\)

Wilcox AVDEV

This simply follows the mean absolute deviation analogue but then using frequencies. Again this is then standardized.

The formula used is (Wilcox, 1973, p. 9):

\(\text{AVDEV} = 1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times \frac{n}{k}\times \left(k-1\right)}= 1-\frac{k\times \sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times n \times \left(k-1\right)}\)

Gibbs-Poston M4

The formula used (Gibbs & Poston, 1975, p. 473):

\(\text{M4} = 1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times n}\)

Gibbs-Poston M5

The problem with M4 is that it can never be 0, so to adjust for this M5 could be used but is computationally then more difficult.

The formula used (Gibbs & Poston, 1975, p. 474):

\(\text{M5} = 1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times\left(n-k+1-\bar{F}\right)}\)

Gibbs-Poston M6

The formula used (Gibbs & Poston, 1975, p. 474):

\(\text{M6} = k\times\left(1-\frac{\sum_{i=1}^k \left|F_i-\bar{F}\right|}{2\times n}\right) = k\times\text{M4}\)

Wilcox VARNC, Gibbs-Poston M2, and Smith & Wilson E1

This is similar as the variance for scale variables.

The formula used is (Wilcox, 1973, p. 11):

\(\text{VARNC} = 1-\frac{\sum_{i=1}^{k}\left(f_i-\bar{F}\right)^2}{\frac{n^2\times\left(k-1\right)}{k}} = \frac{k\times\left(n^2-\sum_{i=1}^k f_i^2\right)}{n^2\times\left(k-1\right)}\)

This is the same as Gibbs and Poston's M2. Their formula looks different but has the same result (Gibbs & Poston, 1975, p. 472)

\(\text{M2} = \frac{1-\sum_{i=1}^k p_i^2}{1-\frac{1}{k}} = \frac{\text{M1}}{1-\frac{1}{k}} = \frac{k}{k-1}\times\text{M1}\)

It is also the same as Smith and Wilson's first evenness measure.

The formula used (Smith & Wilson, 1996, p. 71):

\(E_1 = \frac{1 - D_s}{1 - \frac{1}{k}}\)

With \(D_s\) being Simpson's D, but defined as:

\(D_s = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)

Wilcox STDEV

As with the variance for scale variables, we can take the square root to obtain the standard deviation.

The formula used can be from the VARNC or the MNDIF (Wilcox, 1973, p. 14):

\(\text{STDEV} = 1-\sqrt{\frac{\sum_{i=1}^k \left(F_i-\bar{F}\right)^2}{\left(n-\bar{F}\right)^2+\left(k-1\right)\bar{F}^2}}= 1-\sqrt{\frac{\sum_{i=1}^{k-1}\sum_{j=i+1}^k \left(F_i-F_j\right)^2}{n^2\times\left(k-1\right)}}\)

Entropy Based

Shannon-Weaver Entropy

The formula used (Shannon & Weaver, 1949, p. 20):

\(H_{sw}=-\sum_{i=1}^k p_i\times\ln\left(p_i\right)\)

Rényi entropy

This is a generalisation for Shannon entropy.

The formula used is (Rényi, 1961, p. 549):

\(H_q = \frac{1}{1 - q}\times\ln\left(\sum_{i=1}^k p_i^q\right)\)

Wilcox HREL and Pielou J

This uses Shannon's entropy but divides it over the maximum possible uncertainty.

The formula used (Wilcox, 1973, p. 16):

\(\text{HREL} = \frac{-\sum_{i=1}^k p_i \times \text{log}_2 p_i}{\text{log}_2 k}\)

This is the same as Pielou J.

The formula used (Pielou, 1966, p. 141):

\(J=\frac{H_{sw}}{\ln\left(k\right)}\)

Sheldon Index

The formula used (Sheldon, 1969, p. 467):

\(E = \frac{e^{H_{sw}}}{k}\)

Heip Index

The formula used is (Heip, 1974, p. 555):

\(E_h = \frac{e^{H_{sw}} - 1}{k - 1}\)

Evenness and Diversity Based

Hill Diversity

The formula used is (Hill, 1973, p. 428):

\(N_a = \begin{cases}\left(\sum_{i=1}^k p_i^a\right)^{\frac{1}{1-a}} & \text{ if } a\neq 1 \\ e^{H_{sw}} & \text{ if } =1 \end{cases}\)

Hill Eveness

The formula used is (Hill, 1973, p. 429):

\(E_{a,b} = \frac{N_a}{N_b}\)

Where \(N_a\) and \(N_b\) are Hill's diversity values for a and b.

Bulla E

Bulla's evenness measure.

The formula used is (Bulla, 1994, pp. 168-169):

\(E_b = \frac{O - \frac{1}{k} - \frac{k - 1}{n}}{1 - \frac{1}{k} - \frac{k - 1}{n}}\)

With:

\(O = \sum_{i=1}^k \min\left(p_i, \frac{1}{k}\right)\)

Bulla D

Bulla's Evenness measure converted to a diversity measure.

The formula used is (Bulla, 1994, p. 169):

\(D_b = E_b\times k\)

Where \(E_b\) is Bulla E value.

With:

\(O = \sum_{i=1}^k \min\left(p_i, \frac{1}{k}\right)\)

Simpson D (and Gibbs-Poston M1)

The formula used is based on Simpson (1949, p. 688):

\(\text{D_1} = \frac{\sum_{i=1}^k f_i\times\left(f_i-1\right)}{n\times\left(n-1\right)}\)

Another alternative is:

\(D_2 = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)

Often the result is subtracted from 1 to reverse the scale.

\(\text{D_3} = 1-\frac{\sum_{i=1}^k f_i\times\left(f_i-1\right)}{n\times\left(n-1\right)}\)

and

\(D_4 = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)

This also makes D4 the same as Gibb and Poston M1

The formula used (Gibbs & Poston, 1975, p. 471):

\(\text{M1} = 1 - \sum_{i=1}^k p_i^2\)

Gibbs-Poston M3

The formula used (Gibbs & Poston, 1975, p. 472):

\(\text{M3} = \frac{1-\sum_{i=1}^k p_i^2-p_{min}}{1-\frac{1}{k}-p_{min}}\)

With \(p_{min}\) the lowest proportion

Smith & Wilson E2

The formula used (Smith & Wilson, 1996, p. 71):

\(E_2 = \frac{\ln\left(D_s\right)}{\ln\left(k\right)}\)

With \(D_s\) being Simpson's D, but defined as:

\(D_s = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)

Smith & Wilson E3

The formula used (Smith & Wilson, 1996, p. 71):

\(E_3 = \frac{1}{D_s \times k}\)

With \(D_s\) being Simpson's D, but defined as:

\(D_s = \sum_{i=1}^k\left(\frac{F_i}{n}\right)^2\)

Other Methods

Wilcox MNDIF

Analog of the mean difference measure for scale variables.

The formula used is (Wilcox, 1973, p. 9):

\(\text{MNDIF} = 1-\frac{\sum_{i=1}^{k-1}\sum_{j=i+1}^k \left|F_i-F_j\right|}{n\times\left(k-1\right)}\)

Kaiser b

The formula used (Kaiser, 1968, p. 211):

\(B = 1 - \sqrt{1 - \left(\sqrt[k]{\prod_{i=1}^k\frac{f_i\times k}{n}}\right)^2}\)

Statistical Measures

Center

median

mode

Dispersion

Qualitative Variation

Google adds