Quantitative Variation
Explanation
A measure of central tendancy is very limited in describing the data. Thats why often a measure of dispersion is also added. To put it blundly, if your head is in a burning oven, and your feet are in a refridgerator, on average you are doing fine, but there is too much variation.
Various measures of variation/dispersion exist for scale variables. To measure variation we need to vary from some kind of 'norm'. Measures of central tendency are then usually this 'norm', either the mean or the median.
We also do not care if the deviation is positive or negative, it deviates all the same. To get rid of negatives we can either take the absolute value, or square the results. By squaring, we also emphasize more larger differences.
The variance (Fisher, 1918, p. 399) is the average of the squared differences with the mean. This results in a relative large value compared to the original scores, since everything got squared. To counter this, the square root is often taken, which is then known as the standard deviation (Pearson, 1894, p. 80).
The standard deviation gives information about the diversity of the scores. It could indicate how well people agreed with each other, how much variation there was, or how stable something is. Chebyshev’s inequality (Tchébychef, 1867) states that 75% of all scores will fall within two standard deviations from the mean, and almost 89% within 3 standard deviations. If for example the mean age is 23 and the standard deviation is 3, then we can expect that 75% of the respondents have an age between (23 – 2 x 3 =) 17 and (23 + 2 x 3 =) 29, and almost 88% between (23 – 3 x 3 =) 14 and (23 + 3 x 3 =) 32.
Taking the absolute value of the differences with the mean, and determining the average of those, is known as the Mean Absolute Deviation. We could also take the mean of the absolute deviations from the median, giving the Mean Absolute Deviation from the Median, or take the median out of those to get the Median Absolute Deviations from the Median.
Siraj-Ud-Doulah (2018) propose to use the decile mean, instead of the arithmetic mean, and then determine the standard deviation, giving the Decile Standard Deviation.
To compare one measure with the set of another, the standard deviation can be divided by the mean to give the Coefficient of Variation (Pearson, 1896, p. 277), or when using the decile mean the Coefficient of Deviation (Siraj-Ud-Doulah 2018, p. 310).
Note that for the variance and standard deviation a sample and population version exist. The sample version divides the squared differences by the sample size minus one, the population version by the size of the population. The sample variance is a so-called unbiased estimator of the population variance, but although the sample standard deviation is a better approximation than just dividing it by the sample size, strickly it is not an unbiased estimator of the population standard deviation. This is why in some programs, the offset of the sample size can be set to any value.
One could argue that with more samples the variance will increase most likely, and since a sample is smaller than the population, the sample variance would be a little too low if we divide it by the sample size. By reducing the sample size with one, we divide by a smaller number, making the variance itself larger and more close to the population variance.
Obtaining the Measures
with Excel
Excel file from video: ME - Quantitative Variation (E).xlsm
using stikpetE
without using stikpetE
with Python
Notebook from video: ME - Quantitative Variation (P).ipynb
using stikpetP
without using stikpetP
with R
Notebook from video: ME - Quantitative Variation (R).ipynb
using stikpetR
without using stikpetR
with SPSS
Standard Deviation and Variance:
Mean or Median Absolute Deviation:
Coefficient of Variation or Deviation:
Formula
Standard Deviation (std)
The formula used is:
\(s = \sqrt{\frac{\sum_{i=1}^n \left(x_i - \bar{x}\right)^2}{n - d}}\)
Where \(d\) is the offset. If this is 1 the sample version is obtained, if 0 the population version.
Variance (var)
The formula used is:
\(s^2 = \frac{\sum_{i=1}^n \left(x_i - \bar{x}\right)^2}{n - d}\)
Mean Absolute Deviation (mad)
The formula used is:
\(MAD = \frac{\sum_{i=1}^n \left| x_i - \bar{x}\right|}{n}\)
Mean Absolute Deviation from the Median (madmed)
The formula used is:
\(MAD = \frac{\sum_{i=1}^n \left| x_i - \tilde{x}\right|}{n}\)
Where \(\tilde{x}\) is the median
Median Absolute Deviation (medad)
The formula used is:
\(MAD = MED\left(\left| x_i - \tilde{x}\right|\right)\)
Decile Standard Deviation
The formula used is (Siraj-Ud-Doulah 2018, p. 310):
\(s_{dm} = \sqrt{\frac{\sum_{i=1}^n \left(x_i - DM\right)^2}{n - d}}\)
Where DM is the decile mean.
Coefficient of Variation (cv)
The formula used is (Pearson, 1896, p. 277):
\(CV = \frac{s}{\bar{x}}\)
Coefficient of Diversity (cd)
The formula used is (Siraj-Ud-Doulah 2018, p. 310):
\(CD = \frac{s_{dm}}{DM}\)
Google adds