Hodges-Lehmann Estimate
Introduction
The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score.
It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median.
In the independent samples case, the Hodges-Lehmann estimate, is the median of all the possible differences between two sets of data. The authors (Hodges & Lehmann, 1963) describe it as the location shift that is needed to align two distributions (with similar distributions) as much as possible (p. 599).
It is sometimes incorrectly described as the difference between the two medians. It is not uncommon to have a different Hodges-Lehmann estimate than simply taking the difference between the two medians.
This measure is sometimes mentioned as an effect size measure for a Mann-Whitney U / Wilcoxon Rank Sum test (van Geloven, 2018), however since it is a median of the possible differences, it is not standardized (i.e. it doesn't range between two fixed values, and depends therefor on the data).
Obtaining the Measure
(click below on the version of interest to expand)
one-sample version
(click below on program of interest to expand)
with Excel
Excel file: ME - Hodges-Lehmann Estimate (One-Sample) (E).xlsm
with stikpetE add-in
without stikpetE add-in
with Python
Jupyter Notebook: ME - Hodges-Lehmann Estimate (One-Sample) (P).ipynb
with stikpetP library
without stikpetP library
with R (Studio)
Jupyter Notebook: ES - Common Language (One-Sample) (R).ipynb
with stikpetR library
without stikpetR library
with SPSS (not possible?)
Unfortunately, I'm not aware on how to do this with SPSS using the GUI.
with Formulas
The formula for the Hodges-Lehmann estimator for one-sample (Hodges & Lehmann, 1963, p. 599):
\(HL = \text{median}\left(\frac{x_i + y_j}{2} | 1 \leq i \leq j, i \leq j \leq n\right)\)
Symbols used:
- \(x_i\), the i-th score
- \(x_j\), the j-th score
- \(n\), the number of scores
There might be a faster method to actually determine this. Algorithm 616 (Monahan, 1984), but couldn't translate the Fortran code to R or Python.
Worked out Example
Lets say we have the scores 1, 3, 7, 8
We then need to determine the Walsh averages for each possible pair:
| \(\frac{x_i + x_j}{2}\) | \(x_j\) | |||
|---|---|---|---|---|
| \(x_i\) | 1 | 3 | 7 | 8 |
| 1 | (1 + 1)/2 = 1 | (1 + 3)/2 = 2 | (1 + 7)/2 = 4 | (1 + 8)/2 = 4.5 |
| 3 | (3 + 1)/2 = 2 | (3 + 3)/2 = 3 | (3 + 7)/2 = 5 | |
| 7 | (7 + 1)/2 = 4 | (7 + 3)/2 = 5 | ||
| 8 | (8 + 1)/2 = 4.5 | |||
The list of all differences between every possible pair is then:
\(1,2,4,4.5,2,3,5,4,5,4.5\)
We can sort this list to get:
\(1,2,2,3,4,4,4.5,4.5,5,5\)
Next we need to determine the median of all these differences. There are 10 Walsh averages in total ( \(\frac{4\times\left(4+1\right)}{2}\) ), so the median will be the \(\frac{10+1}{2} = 5.5\)-th score.
The 5th score is 4 and also the 5th, so the 5.5 will also be 4 (otherwise we would have averaged the two).
The Hodges-Lehmann is therefore 4.
independent samples version
(click below on the version of interest to expand)
with Python
Jupyter Notebook: ES - Hodges-Lehmann (ind sam) (P).ipynb
with stikpetP
without stikpetP
with SPSS
Formulas
The formula for the Hodges-Lehmann estimator with two samples is (Hodges & Lehmann, 1963, p. 602):
\(HL = \text{median}\left(y_j - x_i | 1 \leq i \leq n_x, 1 \leq j \leq n_y\right)\)
Symbols used:
- \(x_i\), the i-th score in category x
- \(x_j\), the j-th score in category y
- \(n_i\), the number of scores in category i
There might be a faster method to actually determine this. Algorithm 616 (Monahan, 1984), but couldn't translate the Fortran code to R or Python.
Worked out Example
Lets say we have the scores 1, 3, 7, 8 in one category x, and the scores 2, 5, 8, 8, 9 in a second category y.
We then need to determine the difference for each possible pair:
| \(y-x\) | y | ||||
|---|---|---|---|---|---|
| x | 2 | 5 | 8 | 8 | 9 |
| 1 | 2 - 1 = 1 | 5 - 1 = 4 | 8 - 1 = 7 | 8 - 1 = 7 | 9 - 1 = 8 |
| 3 | 2 - 3 = -1 | 5 - 3 = 2 | 8 - 3 = 5 | 8 - 3 = 5 | 9 - 3 = 6 |
| 7 | 2 - 7 = -5 | 5 - 7 = -2 | 8 - 7 = 1 | 8 - 7 = 1 | 9 - 7 = 2 |
| 8 | 2 - 8 = -6 | 5 - 8 = -3 | 8 - 8 = 0 | 8 - 8 = 0 | 9 - 8 = 1 |
The list of all differences between every possible pair is then:
\(-1,-4,-3,-3,-3,1,-2,-5,-5,-6,5,2,-1,-1,-2,6,3,0,0,-1\)
We can sort this list to get:
\(-6,-5,-5,-4,-3,-3,-3,-2,-2,-1,-1,-1,-1,0,0,1,2,3,5,6\)
Next we need to determine the median of all these differences. There are 20 scores in total ( \(4\times 5\) ), so the median will be the \(\frac{20+1}{2} = 10.5\)-th score.
The 10th score is -1 and also the 11th, so the 10.5 will also be -1 (otherwise we would have averaged the two).
The Hodges-Lehmann is therefore -1.
Interpretation
Hodges-Lehmann Estimate is the median of all possible differences. It could therefor range between minus and plus infinity. It shows that if you take any pair of values (one from each of the two categories), that in 50% of the cases the differences between the two will be the Hodges-Lehmann value or more.
Unfortunately I have not been able to find any rules-of-thumb for this measure.
Google adds
