Median

Measures of central tendency try to establish somewhat of the ‘most typical’ value for the data. For nominal data the mode is the only measure of central tendency that could be used. It is the score (or scores) that occur most often. We could also determine the mode for an ordinal variable, but then we are not taken into account the order of the items. Because of this, a more frequently used (and probably better) measure of central tendency would be the so-called median.

The median is the score at the middle of all scores, or more formally defined “the middle value in a distribution, below and above which lie values with equal total frequencies or probabilities” (Porkess, 1991, p. 134). This means that 50% of the respondents scored equal or higher to the median, and also 50% of the respondents scored lower or equal. If for example at a school exam the results indicate that the median is a 70 (out of 100, with 55 or more being a pass), then we know that at least 50% of the students passed. From a frequency table, the median can quickly be found by looking at the cumulative percentages. It is the first category where these are above 50%. If a category has exactly a cumulative percentage of 50% the median falls exactly between that category and the next.

The median is sometimes also preferred over the arithmetic mean for scale data. This is often the case if there are some outliers (extreme values), since the median is not influenced by these. For example the median of the values 1, 2, 5 is simply 2, but also 1, 2, 5000000 has as a median 2.

Click here to see how to determine the median, with Excel, Python, R, SPSS, or see the Formula.

with Excel

Excel file from video: ME - Median.xlsm.

using stikpetE

without using stikpetE

with Python

Notebook used in video: ME - Median.ipynb.

using stikpetP

without using stikpetP

with R

R script used in video: ME - Median.R.

using stikpetR

without using stikpetR

Data file used in video and notebook GSS2012-Adjusted.sav.

with SPSS

Three different methods are shown below, each will eventually give the same result.

using Frequencies

The video below shows how to obtain the median using the Frequencies option.

Datafile used in video: GSS2012-Adjusted.sav

using Explore

The video below shows how to obtain the median using the Explore option.

Datafile used in video: GSS2012-Adjusted.sav

using a shortcut

The video below shows how to obtain the median using a shortcut.

Datafile used in video: GSS2012-Adjusted.sav

Formula

\(\tilde{x} = \begin{cases} x_i & i = \left \lfloor i \right \rfloor \\ \frac{x_{i-0.5} + x_{i+0.5}}{2} & i \neq \left \lfloor i \right \rfloor \end{cases} \)

with:

\(i = \frac{n + 1}{2}\)

Where \(n\) is the sample size, and all scores of \(x\) have been sorted

If the data is non-numeric, the number of scores even, and the median index falls between two different categories, three options can be chosen. Use either the lower value (i.e. \(x_{i-0.5}\) ), the upper value (i.e. \(x_{i+0.5}\) ), or simply report ‘between \(x_{i-0.5}\) and \(x_{i+0.5}\)’. The option to take the lower or upper one can be found in Python’s statistics library functions median_low and median_high (Python, n.d.).

Unclear what the real origin is. One old source can be Pacioli (1523), which is in Italian, but can also be found in for example Galton (1881, p. 246), or Cournot (1843, p. 120).

Statistical Measures

Center

median

mode

Dispersion

Qualitative Variation

Google adds